R dplyr按组执行不同的聚合
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了R dplyr按组执行不同的聚合相关的知识,希望对你有一定的参考价值。
我有一个数据帧dat
,看起来像这样:
dat <- structure(list(cell.ID = c(329574L, 329574L, 329574L, 329574L,
329574L, 329574L, 329574L, 329574L, 329574L, 329574L, 329574L,
329574L), Year = c("2010", "2010", "2010", "2010", "2010", "2010",
"2010", "2010", "2010", "2010", "2010", "2010"), month_name = c("June",
"July", "June", "July", "June", "July", "June", "July", "June",
"July", "June", "July"), value = c(459.860986624053, 398.94083733151,
16, 23, 111.69, 453.333, 71.55, 30.38, 31.928, 30.13355, 17.587,
19.7938709677419), variable_name = c("ETo", "ETo", "Rday", "Rday",
"Rsum", "Rsum", "Thdd", "Thdd", "Tmax", "Tmax", "Tmin", "Tmin"
), monthID = c(6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L, 6L, 7L
)), row.names = c(NA, -12L), class = "data.frame")
library(dplyr)
dat %>%
dplyr::group_by(Year, variable_name) %>%
dplyr::summarise(variable = sum(value))
如果我想对Tmax和Tmin求平均值,并对其余变量求和,则执行此操作
dat %>%
dplyr::group_by(Year, variable_name) %>%
dplyr::summarise(variable = ifelse(variable_name %in% c('Tmax', 'Tmin'), mean(value), sum(value)))
Error: Column `variable` must be length 1 (a summary value), not 2
我该如何纠正?
答案
I think问题是,在这种情况下,ifelse
是按行操作的,而不是在组的级别上。如果是正确的话,则可以通过获取两个摘要统计信息,然后有条件地通过变量名称选择想要的变量,来解决此问题,如下所示:
dat %>%
dplyr::group_by(Year, variable_name) %>%
dplyr::summarise(var_mean = mean(value), var_sum = sum(value)) %>%
dplyr::mutate(variable = ifelse(variable_name %in% c('Tmax', 'Tmin'), var_mean, var_sum)) %>%
dplyr::select(-var_mean, -var_sum)
结果:
# A tibble: 6 x 3
# Groups: Year [1]
Year variable_name variable
<chr> <chr> <dbl>
1 2010 ETo 859.
2 2010 Rday 39
3 2010 Rsum 565.
4 2010 Thdd 102.
5 2010 Tmax 31.0
6 2010 Tmin 18.7
另一答案
[另一种方法是dplyr
是使用if
和else
而不是ifelse
:
dat %>%
group_by(Year, variable_name) %>%
summarise(variable = if (variable_name[1] %in% c('Tmax', 'Tmin')) mean(value) else sum(value))
# A tibble: 6 x 3
# Groups: Year [1]
Year variable_name variable
<chr> <chr> <dbl>
1 2010 ETo 859.
2 2010 Rday 39
3 2010 Rsum 565.
4 2010 Thdd 102.
5 2010 Tmax 31.0
6 2010 Tmin 18.7
以上是关于R dplyr按组执行不同的聚合的主要内容,如果未能解决你的问题,请参考以下文章
R(和 dplyr?) - 按组从数据帧中采样,最大样本大小为 n