在 R 中使用 aggregate/group_by 对数据进行分组并对每个因子变量进行计数?
Posted
技术标签:
【中文标题】在 R 中使用 aggregate/group_by 对数据进行分组并对每个因子变量进行计数?【英文标题】:Using aggregate/group_by in R to group data and give a count for each factor variable? 【发布时间】:2022-01-12 23:56:04 【问题描述】:我有一个看起来像这样的数据框。为了简单起见,我展示了前 6 行,但总行数为 8236。等级范围为 0-2。我刚刚在下面的示例中显示了 0 级和 1 级:
Telangiectasia_time grade
<chr> <int>
1 telangiectasia_tumour_0 0
2 telangiectasia_tumour_1 0
3 telangiectasia_tumour_12 0
4 telangiectasia_tumour_24 0
5 telangiectasia_tumour_0 1
6 telangiectasia_tumour_1 1
我想按 Telangiectasia_Time(第一列)分组,然后计算每组的成绩数。因此,以前 6 行为例,它应该如下所示:
Telangiectasia_time grade0 grade1 grade2
1 telangiectasia_tumour_0 1 1 0
2 telangiectasia_tumour_1 1 1 0
3 telangiectasia_tumour_12 1 0 0
4 telangiectasia_tumour_24 1 0 0
最后有三列分别代表各个等级,每个变量的每个等级都有一个计数。我尝试使用聚合函数:
**aggregate(grade ~ Telangiectasia_time, telangiectasia_tumour_data, *sum*)**
但我不确定在括号的最后一位中放什么,以便返回每个等级的总和。当我输入总和时,它只是将数字相加,而不是将变量视为单独的(0,1 和 2)。使用我的完整数据集,我得到了错误的输出:
Telangiectasia_time grade
1 telangiectasia_tumour_0 18
2 telangiectasia_tumour_1 11
3 telangiectasia_tumour_12 38
4 telangiectasia_tumour_24 87
我也尝试过 group_by() 但这只是给了我一个总数
telangiectasia_tumour_data %>% group_by(Telangiectasia_time) %>% summarize(count =n())
Telangiectasia_time count
* <chr> <int>
1 telangiectasia_tumour_0 2059
2 telangiectasia_tumour_1 2059
3 telangiectasia_tumour_12 2059
4 telangiectasia_tumour_24 2059
【问题讨论】:
【参考方案1】:使用dpylr::count
和tidyr::pivot_wider
你可以这样做:
library(dplyr)
library(tidyr)
telangiectasia_tumour_data %>%
count(Telangiectasia_time, grade) %>%
pivot_wider(names_from = grade, values_from = n, names_prefix = "grade", values_fill = 0)
#> # A tibble: 4 × 3
#> Telangiectasia_time grade0 grade1
#> <chr> <int> <int>
#> 1 telangiectasia_tumour_0 1 1
#> 2 telangiectasia_tumour_1 1 1
#> 3 telangiectasia_tumour_12 1 0
#> 4 telangiectasia_tumour_24 1 0
数据
telangiectasia_tumour_data <- structure(list(Telangiectasia_time = c(
"telangiectasia_tumour_0",
"telangiectasia_tumour_1", "telangiectasia_tumour_12", "telangiectasia_tumour_24",
"telangiectasia_tumour_0", "telangiectasia_tumour_1"
), grade = c(
0L,
0L, 0L, 0L, 1L, 1L
)), class = "data.frame", row.names = c(
"1",
"2", "3", "4", "5", "6"
))
【讨论】:
以上是关于在 R 中使用 aggregate/group_by 对数据进行分组并对每个因子变量进行计数?的主要内容,如果未能解决你的问题,请参考以下文章