按组将一列转换为多列

Posted 2023-02-16

技术标签:

【中文标题】按组将一列转换为多列【英文标题】：Convert a column into multi column by groups 【发布时间】：2015-05-17 16:56:31 【问题描述】：

我有一个数据框（df）：

group col
a     12
a     15
a     13
b     21
b     23

所需的输出也是一个数据框（df1）：

Namley，我想将“df”的“col”按“group”划分为多列，如“col1”和“col2”。

当每列的长度不相等时，必须在每列的末尾添加“0”，直到每列的长度达到最大列长度。

【问题讨论】：

【参考方案1】：

我们可以使用base R 函数split 或unstack 将'col' by 'group' 拆分为一个列表，然后填充NA 以列出小于列表最大长度的元素元素。更改列名，将 'NA' 替换为 0。

  lst <- unstack(df1, col~group)
  d1 <- as.data.frame(sapply(lst, `length<-`, max(sapply(lst, length))))
  d1[is.na(d1)] <- 0
  colnames(d1) <- paste0('col', 1:ncol(d1))
  d1
 #  col1 col2
 #1   12   21
 #2   15   23
 #3   13    0

或者使用stri_list2matrix from stringi

library(stringi)
d1 <- as.data.frame(stri_list2matrix(unstack(df1, col~group),
            fill=0), stringsAsFactors=FALSE)
d1[] <- lapply(d1, as.numeric)

或使用data.table/splitstackshape

library(splitstackshape)
setnames(dcast(getanID(df1, 'group'), .id~group, value.var='col',
             fill=0L)[, .id:= NULL], paste0('col', 1:2))[]
#    col1 col2
#1:   12   21
#2:   15   23
#3:   13    0

【讨论】：

【参考方案2】：

如何使用 dplyr...

library(dplyr)
library(tidyr)

df1 %>%
  group_by(group) %>%
  mutate(n = row_number()) %>%
  spread(group, col) %>%
  select(-n) %>%
  (function(x)  x[is.na(x)] <- 0; x )

【讨论】：

我正在研究类似的解决方案，但使用 mutate_each(funs(ifelse(is.na(.), 0, .))) 替换了 NA。当然！它看起来更加优雅。另一方面 - 子集比 ifelse ...【参考方案3】：

既然你用零填充，另一个想法：

xtabs(col ~ ave(DF$col, DF$group, FUN = seq_along) + group, DF)
#                                      group
#ave(DF$col, DF$group, FUN = seq_along)  a  b
#                                     1 12 21
#                                     2 15 23
#                                     3 13  0

其中“DF”：

DF = structure(list(group = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("a", 
"b"), class = "factor"), col = c(12L, 15L, 13L, 21L, 23L)), .Names = c("group", 
"col"), class = "data.frame", row.names = c(NA, -5L))

【讨论】：

以上是关于按组将一列转换为多列的主要内容，如果未能解决你的问题，请参考以下文章

怎么将excel中两列转换成多行多列

将SQL从一列多个原始数据转换为多列

将一列转换为特定列数

按组将数据框日期拆分为单个最小最大日期范围

如何将一列时间戳转换为日期时间？ [复制]

窗口函数将一列中的 n 行转换为单行