R：合并同一数据表中的行，连接某些列

Posted 2023-02-24

技术标签:

【中文标题】R：合并同一数据表中的行，连接某些列【英文标题】：R: Merge of rows in same data table, concatenating certain columns 【发布时间】：2012-07-07 01:51:07 【问题描述】：

我在 R 中有我的数据表。我想合并具有相同 customerID 的行，然后连接其他合并列的元素。

我想从这里开始：

   title  author customerID
1 title1 author1          1
2 title2 author2          2
3 title3 author3          1

到这里：

           title           author Group.1
1 title1, title3 author1, author3       1
2         title2          author2       2

【问题讨论】：

【参考方案1】：

也许不是最好的解决方案，但很容易理解：

df <- data.frame(author=LETTERS[1:5], title=LETTERS[1:5], id=c(1, 2, 1, 2, 3), stringsAsFactors=FALSE)

uniqueIds <- unique(df$id)

mergedDf <- df[1:length(uniqueIds),]

for (i in seq(along=uniqueIds)) 
    mergedDf[i, "id"] <- uniqueIds[i]
    mergedDf[i, "author"] <- paste(df[df$id == uniqueIds[i], "author"], collapse=",")
    mergedDf[i, "title"] <- paste(df[df$id == uniqueIds[i], "title"], collapse=",")


mergedDf
#  author title id
#1    A,C   A,C  1
#2    B,D   B,D  2
#3      E     E  3

【讨论】：

很好，但是 R 有一些用于处理分组数据的内置函数。最适合这种情况的是aggregate(df[-3], by=list(df$id), c)，但by(df[-3], df$id, c) 也提供相同的结果，只是格式完全不同。 @mrdwab：谢谢，我不经常使用数据帧，也不知道aggregate 函数。【参考方案2】：

aggregate 函数应该可以帮助您找到解决方案：

dat = data.frame(title = c("title1", "title2", "title3"),
                 author = c("author1", "author2", "author3"),
                 customerID = c(1, 2, 1))
aggregate(dat[-3], by=list(dat$customerID), c)
#   Group.1 title author
# 1       1  1, 3   1, 3
# 2       2     2      2

或者，只需确保在创建数据框时添加 stringsAsFactors = FALSE 就可以了。如果您的数据已经被分解，您可以使用 dat[c(1, 2)] = apply(dat[-3], 2, as.character) 之类的东西先将它们转换为字符，然后：

aggregate(dat[-3], by=list(dat$customerID), c)
#   Group.1          title           author
# 1       1 title1, title3 author1, author3
# 2       2         title2          author2

【讨论】：

@HarryPalmer，我不确定我是否理解您的后续问题。假设您已将aggregate 的输出分配给另一个对象，例如temp、temp$title 将是一个列表（如list(0` = c("title1", "title3"), 1 =此示例中的 "title2"). The title` 和 author 列是列表。这就是你要找的东西吗？嗯，我想我现在明白了，谢谢。我对数据类型感到困惑。请再问一个问题：如何消除聚合后出现在列/行列表元素中的重复项？我试过 data1

以上是关于R：合并同一数据表中的行，连接某些列的主要内容，如果未能解决你的问题，请参考以下文章