Summarize 和 group_by 不使用因子变量
Posted
技术标签:
【中文标题】Summarize 和 group_by 不使用因子变量【英文标题】:Summarise and group_by not working with factor variables 【发布时间】:2021-11-23 14:12:42 【问题描述】:我目前使用的是 tidyverse 包版本 1.3.1,当我运行以下代码时:
data <- data.frame(gender = c(1,2,1,2,2,2,2,1,2,1), age = c(18,20,21,24,25,24,24,25,22,21))
data <- data%>%
mutate(gender = factor(gender, levels = c("male", "female")))
data%>%
group_by(gender)%>%
summarise(mean = mean(age))
我得到了这些结果
# A tibble: 1 × 2
gender mean
<fct> <dbl>
1 NA 22.4
【问题讨论】:
【参考方案1】:是的,您应该更改 labels
而不是 levels
。
library(dplyr)
data%>%
mutate(gender = factor(gender, labels = c("male", "female"))) %>%
group_by(gender)%>%
summarise(mean = mean(age))
# gender mean
# <fct> <dbl>
#1 male 21.2
#2 female 23.2
【讨论】:
【参考方案2】:我们不需要转换为factor
进行重新编码。可以通过使用“性别”(数字变量)作为替换值的索引来直接完成
library(dplyr)
data %>%
group_by(gender = c("male", "female")[gender]) %>%
summarise(mean = mean(age, na.rm = TRUE))
-输出
# A tibble: 2 × 2
gender mean
<chr> <dbl>
1 female 23.2
2 male 21.2
或者使用fct_recode
library(forcats)
data %>%
group_by(gender = fct_recode(as.character(gender), male = "1",
female = "2")) %>%
summarise(mean = mean(age, na.rm = TRUE))
# A tibble: 2 × 2
gender mean
<fct> <dbl>
1 male 21.2
2 female 23.2
【讨论】:
以上是关于Summarize 和 group_by 不使用因子变量的主要内容,如果未能解决你的问题,请参考以下文章
R语言dplyr包获取dataframe分组聚合汇总统计值实战(group_by() and summarize() ):均值中位数分位数IQRMADcountunique
使用 group_by 并行 wilcox.test 并总结