R中的分组计算[重复]

Posted 2023-02-16

技术标签:

【中文标题】R中的分组计算[重复]【英文标题】：Groupwise computation in R [duplicate] 【发布时间】：2017-01-20 14:48:39 【问题描述】：

我已经在 R 中对一个数据框进行了分组和汇总，所以我现在有一个类似的表格：

Group | Value | Count
==========================
   A  |   1   |   4
   A  |   2   |   2
   A  |   10  |   4
   B  |   3   |   2
   B  |   4   |   4
   B  |   2   |   3
   C  |   5   |   3
   C  |   2   |   6

我有兴趣找出每个组中值 2 的相对频率：

Group | Relative freq of 2
==========================
   A  |  2/(4+2+4) = 0.2
   B  |  3/(2+4+3) = 0.33
   C  |  6/(3+6) = 0.67

除了编写一堆带有循环和条件的代码之外，在 R 中是否有一种简单、优雅的计算方法？可能使用 dplyr。

【问题讨论】：

【参考方案1】：

使用dplyr，在按“组”分组后，我们将“值”为 2 的“计数”子集（假设每个“组”只有一个 2 的“值”）并除以 @ '计数'的 987654322@

library(dplyr)
df1 %>%
   group_by(Group) %>% 
   summarise(RelFreq = round(Count[Value==2]/sum(Count), 2))
#  Group RelFreq
#  <chr>   <dbl>
#1     A    0.20
#2     B    0.33
#3     C    0.67

对应的data.table选项是

library(data.table)
setDT(df1)[, .(RelFreq = round(Count[Value == 2]/sum(Count),2)), by = Group]

【讨论】：

【参考方案2】：

这是一个基本的 R 解决方案：

sapply(split(df1, df1$Group), 
   function(x) round(sum(x$Count[x$Value == 2]) / sum(x$Count), 2))

##  A    B    C 
## 0.20 0.33 0.67

【讨论】：

【参考方案3】：

您可以使用 for 循环使用相同的逻辑

for(i in unique(df$Group))
  df$Relative_freq_of_2[df$Group==i] <- round(df$Count[df$Value==2 & df$Group==i]/sum(df$Count[df$Group==i]),2)


df <- unique(df[,c("Group","Relative_freq_of_2")])

Group Relative_freq_of_2
    A               0.20
    B               0.33
    C               0.67

【讨论】：

【参考方案4】：

这个带sqldf的：

library(sqldf)
df1 <- sqldf('select `Group`,`Count` from df where Value=2')
df2 <- sqldf('select `Group`, sum(`Count`) as `Count` from df group by `Group`')
df1$Count <- df1$Count / df2$Count
df1
Group     Count
1     A 0.2000000
2     B 0.3333333
3     C 0.6666667

【讨论】：

或者：sqldf("select [Group], sum(([Value] = 2) * [Count]) / (sum([Count]) + 0.0) Freq from df group by [Group]")

以上是关于R中的分组计算[重复]的主要内容，如果未能解决你的问题，请参考以下文章