合并 tibble 中的行

Posted

技术标签:

【中文标题】合并 tibble 中的行【英文标题】:Merge rows in tibble 【发布时间】:2020-07-09 22:25:04 【问题描述】:

我想在表格中列出我的包中的所有函数。

到目前为止,我从包帮助文档中提取了所有功能和标题

library(magrittr)
package_info <- library(help = magrittr)$info[[2]]
package_info_tbl <- package_info %>% 
  stringr::str_split(pattern = "\\s+", n = 2, simplify = T) %>%
  tibble::as_tibble(.name_repair = "minimal")
colnames(package_info_tbl) <- c("Function", "Title")

package_info_tbl
#> # A tibble: 13 x 2
#>    Function     Title                                          
#>    <chr>        <chr>                                          
#>  1 "%$%"        magrittr exposition pipe-operator              
#>  2 "%<>%"       magrittr compound assignment pipe-operator     
#>  3 "%>%"        magrittr forward-pipe operator                 
#>  4 "%T>%"       magrittr tee operator                          
#>  5 "[[.fseq"    Extract function(s) from a functional sequence.
#>  6 "debug_fseq" Debugging function for functional sequences.   
#>  7 "debug_pipe" Debugging function for magrittr pipelines.     
#>  8 "extract"    Aliases                                        
#>  9 "freduce"    Apply a list of functions sequentially         
#> 10 "functions"  Extract the function list from a functional    
#> 11 ""           sequence.                                      
#> 12 "magrittr"   magrittr - Ceci n'est pas un pipe              
#> 13 "print.fseq" Print method for functional sequence.

由reprex package (v0.3.0) 于 2020 年 3 月 29 日创建

我发现有些行是分开的,如果标题很长,会导致 2 行或更多行。如何合并这些行?

【问题讨论】:

【参考方案1】:

我们可以用NA 值替换空白,使用fillFunction 列中的先前值替换NAgroup_byFunction 并为每个Function 创建一个连接字符串。

library(dplyr)

package_info_tbl %>%
  na_if('') %>%
  tidyr::fill(Function)  %>%
  group_by(Function) %>%
  summarise(Title = paste(Title, collapse = " "))


# A tibble: 12 x 2
#   Function   Title                                                
#   <chr>      <chr>                                                
# 1 [[.fseq    Extract function(s) from a functional sequence.      
# 2 %<>%       magrittr compound assignment pipe-operator           
# 3 %>%        magrittr forward-pipe operator                       
# 4 %$%        magrittr exposition pipe-operator                    
# 5 %T>%       magrittr tee operator                                
# 6 debug_fseq Debugging function for functional sequences.         
# 7 debug_pipe Debugging function for magrittr pipelines.           
# 8 extract    Aliases                                              
# 9 freduce    Apply a list of functions sequentially               
#10 functions  Extract the function list from a functional sequence.
#11 magrittr   magrittr - Ceci n'est pas un pipe                    
#12 print.fseq Print method for functional sequence.               

【讨论】:

【参考方案2】:

如果为空,请使用前一行中的值填充 Function 列。如果Function 相同,则折叠Title

package_info_tbl$Function <- Reduce(function(x,y) if (y=="") x else y, package_info_tbl$Function, acc=T) %>%

package_info_tbl <- package_info_tbl %>% 
  group_by(Function) %>%
  summarise(Title = paste(Title, collapse  = " "))

或者,合并到您的 dplyr 链中

package_info_tbl <- package_info %>% 
      stringr::str_split(pattern = "\\s+", n = 2, simplify = T) %>%
      tibble::as_tibble(.name_repair = "minimal") %>%
      setNames(., c("Function", "Title")) %>%
      mutate(Function = Reduce(function(x,y) if (y=="") x else y, Function, acc=T)) %>%
      group_by(Function) %>%
      summarise(Title = paste(Title, collapse  = " ")) %>%
      ungroup

输出

package_info_tbl

# # A tibble: 12 x 2
#    Function   Title                                                
#    <chr>      <chr>                                                
#  1 %$%        magrittr exposition pipe-operator                    
#  2 %<>%       magrittr compound assignment pipe-operator           
#  3 %>%        magrittr forward-pipe operator                       
#  4 %T>%       magrittr tee operator                                
#  5 [[.fseq    Extract function(s) from a functional sequence.      
#  6 debug_fseq Debugging function for functional sequences.         
#  7 debug_pipe Debugging function for magrittr pipelines.           
#  8 extract    Aliases                                              
#  9 freduce    Apply a list of functions sequentially               
# 10 functions  Extract the function list from a functional sequence.
# 11 magrittr   magrittr - Ceci n'est pas un pipe                    
# 12 print.fseq Print method for functional sequence.  

【讨论】:

【参考方案3】:

我们也可以str_c

library(dplyr)
library(tidyr)
library(stringr)
package_info_tbl %>%
  na_if('') %>%
  fill(Function)  %>%
  group_by(Function) %>%
  summarise(Title = str_c(Title, collapse = " "))

【讨论】:

【参考方案4】:

您可以使用 summarise 聚合这些行。在此之前,分配哪些行属于一起。一个简单的 locf 就足够了:

library("zoo")
library(tidyr)
library(magrittr)
library(dplyr)
package_info <- library(help = magrittr)$info[[2]]
package_info_tbl <- package_info %>% 
  stringr::str_split(pattern = "\\s+", n = 2, simplify = T) %>%
  # set colnames
  `colnames<-`(c("Function", "Title")) %>% 
  tibble::as_tibble() %>% 
  # explicit NAs
  dplyr::mutate(Function = if_else(Function == "", NA_character_, Function),
                # replace NAs with prior value
                Function = zoo::na.locf(Function)) %>% 
  # paste together the strings for each function
  group_by(Function) %>% 
  summarise(Title = paste(Title, collapse = " "))

【讨论】:

以上是关于合并 tibble 中的行的主要内容,如果未能解决你的问题,请参考以下文章

基于 SQL 中的行 ID 合并列的问题

基于R中的行名合并数据框

在 SQL Server 中合并查询中的行

合并spark scala Dataframe中的行

R:合并同一数据表中的行,连接某些列

将 B 列的行与已合并的行 A 的计数合并