如何根据多个条件对行求和 - R? [复制]

Posted

技术标签:

【中文标题】如何根据多个条件对行求和 - R? [复制]【英文标题】:How to sum rows based on multiple conditions - R? [duplicate] 【发布时间】:2015-05-09 11:56:05 【问题描述】:

我有一个数据框,其中包含地块 ID (plotID)、树种代码 (species) 和覆盖值 (cover)。您可以看到其中一个地块中有多个树种记录。如果每个地块中有重复的“物种”行,我如何对“封面”字段求和?

例如,这里是一些示例数据:

# Sample Data
plotID = c( "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200046012040",
       "SUF200046012040", "SUF200046012040", "SUF200046012040", "SUF200046012040", "SUF200046012040", "SUF200046012040")
species = c("ABBA",  "BEPA",  "PIBA2", "PIMA",  "PIRE",  "PIBA2", "PIBA2", "PIMA",  "PIMA",  "PIRE",  "POTR5", "POTR5")
cover = c(26.893939,  5.681818,  9.469697, 16.287879,  1.893939, 16.287879,  4.166667, 10.984848, 16.666667, 11.363636, 18.181818,
          13.257576)
df_original = data.frame(plotID, species, cover)

这是预期的输出:

# Intended Output
plotID2 = c( "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200001035014", "SUF200046012040",
            "SUF200046012040", "SUF200046012040", "SUF200046012040")
species2 = c("ABBA",  "BEPA",  "PIBA2", "PIMA",  "PIRE",  "PIBA2", "PIMA",  "PIRE",  "POTR5")
cover2 = c(26.893939,  5.681818,  9.469697, 16.287879,  1.893939, 20.454546, 18.651515, 11.363636, 31.439394)
df_intended_output = data.frame(plotID2, species2, cover2)

【问题讨论】:

【参考方案1】:

aggregate 很容易

aggregate(cover~species+plotID, data=df_original, FUN=sum) 

data.table 更容易

as.data.table(df_original)[, sum(cover), by = .(plotID, species)]

【讨论】:

【参考方案2】:

您可以通过多种方式做到这一点。使用 base-r,dplyrdata.table 是最典型的。

这是dplyr的方式:

library(dplyr)

df_original %>% group_by(plotID, species) %>% summarize(cover = sum(cover))

#          plotID species     cover
#1 SUF200001035014    ABBA 26.893939
#2 SUF200001035014    BEPA  5.681818
#3 SUF200001035014   PIBA2  9.469697
#4 SUF200001035014    PIMA 16.287879
#5 SUF200001035014    PIRE  1.893939
#6 SUF200046012040   PIBA2 20.454546
#7 SUF200046012040    PIMA 27.651515
#8 SUF200046012040    PIRE 11.363636
#9 SUF200046012040   POTR5 31.439394

这将是 base-r 方式:

aggregate(df_original$cover, by=list(df_original$plotID, df_original$species), FUN=sum)

还有一个data.table的方式——

    library(data.table)
    DT <- as.data.table(df_original)
    DT[, lapply(.SD,sum), by = "plotID,species"]

【讨论】:

【参考方案3】:

如上所述,ddply 来自 plyr 包

    library(plyr)
    ddply(df_original, c("plotID","species"), summarise,cover2= sum(cover))


            plotID          species cover2
    1       SUF200001035014 ABBA    26.893939
    2       SUF200001035014 BEPA    5.681818
    3       SUF200001035014 PIBA2   9.469697
    4       SUF200001035014 PIMA    16.287879
    5       SUF200001035014 PIRE    1.893939
    6       SUF200046012040 PIBA2   20.454546
    7       SUF200046012040 PIMA    27.651515
    8       SUF200046012040 PIRE    11.363636
    9       SUF200046012040 POTR5   31.439394

【讨论】:

以上是关于如何根据多个条件对行求和 - R? [复制]的主要内容,如果未能解决你的问题,请参考以下文章

我可以根据 r 中的其他两个条件创建一个变量的多个类别吗? [复制]

如何根据 SQL 中的分区对行求和?

Mysql - 如何按用户对行进行编号? [复制]

如何限制 MySQL 用户对行执行更新或删除? [复制]

在R中按月对行求和

R中数据帧的条件求和