为堆叠频率表中的每个组添加一列总 n
Posted
技术标签:
【中文标题】为堆叠频率表中的每个组添加一列总 n【英文标题】:Adding a column of total n for each group in a stacked frequency table 【发布时间】:2022-01-19 22:02:29 【问题描述】:我有以下数据:
id animal color shape
1 bear orange circle
2. dog NA triangle
3. NA yellow square
4. cat yellow square
5. NA yellow rectangle
如果我运行这段代码:
df1 <- df %>%
pivot_longer(
-id,
names_to = "Variable",
values_to = "Level"
) %>%
group_by(Variable, Level) %>%
summarise(freq = n()) %>%
mutate(percent = freq/sum(freq)*100) %>%
mutate(Variable = ifelse(duplicated(Variable), NA, Variable)) %>%
ungroup()
我可以得到以下输出:
Variable Level freq(n=5) percent
animal bear 1 33.3
dog 1 33.3
cat 1 33.3
color orange 1 25.0
yellow 3 75.0
shape circle 1 20.0
triangle 1 20.0
square 2 40.0
rectangle 1 20.0
但是我还想在每个变量之后添加一行,其中包含总计:
Variable Level freq(n=5) percent
animal bear 1 33.3
dog 1 33.3
cat 1 33.3
total 3 100.0
color orange 1 25.0
yellow 3 75.0
total 4 100.0
shape circle 1 20.0
triangle 1 20.0
square 2 40.0
rectangle 1 20.0
total 5 100.0
我尝试了 mutate 和 summarise 的不同变体,但不断收到错误“参数的无效'类型'(闭包)”。
【问题讨论】:
janitor::adorn_total
您的输入和输出与正在发生的不匹配;初始 NA
值在哪里?对我来说,它们仍然存在,但不知何故,它们在您的预期输出中消失了。
【参考方案1】:
这是完成任务的一种方法:
library(dplyr)
library(tidyr)
library(janitor)
df %>%
pivot_longer(
-id,
names_to = "Variable",
values_to = "Level"
) %>%
group_by(Variable, Level) %>%
summarise(freq = n()) %>%
mutate(percent = freq/sum(freq)*100) %>%
ungroup() %>%
group_by(Variable) %>%
group_split() %>%
adorn_totals() %>%
bind_rows() %>%
mutate(Level = ifelse(Level == last(Level), last(Variable), Level)) %>%
mutate(Variable = ifelse(duplicated(Variable) |
Variable == "Total", NA, Variable))
Variable Level freq percent
animal bear 1 20
<NA> cat 1 20
<NA> dog 1 20
<NA> <NA> 2 40
<NA> Total 5 100
color orange 1 20
<NA> yellow 3 60
<NA> <NA> 1 20
<NA> Total 5 100
shape circle 1 20
<NA> rectangle 1 20
<NA> square 2 40
<NA> triangle 1 20
<NA> Total 5 100
【讨论】:
投反对票的原因?【参考方案2】:
library(dplyr)
library(tidyr)
library(purrr)
library(janitor)
df1 %>%
pivot_longer(
-id,
names_to = "Variable",
values_to = "Level"
) %>%
group_by(Variable, Level) %>%
summarise(freq = n()) %>%
mutate(percent = freq/sum(freq)*100) %>%
group_split() %>%
map_dfr(. , ~.x %>%
adorn_totals(name = "total")) %>%
mutate(Variable = ifelse(duplicated(Variable) & Variable != "total", NA, Variable)) %>%
ungroup()
#> Variable Level freq percent
#> animal bear 1 20
#> <NA> cat 1 20
#> <NA> dog 1 20
#> <NA> <NA> 2 40
#> total - 5 100
#> color orange 1 20
#> <NA> yellow 3 60
#> <NA> <NA> 1 20
#> total - 5 100
#> shape circle 1 20
#> <NA> rectangle 1 20
#> <NA> square 2 40
#> <NA> triangle 1 20
#> total - 5 100
数据:
read.table(text = "id animal color shape
1 bear orange circle
2 dog NA triangle
3 NA yellow square
4 cat yellow square
5 NA yellow rectangle", header = T, stringsAsFactors = F) -> df1
【讨论】:
【参考方案3】:如果我们在定义df1
时停下脚步,
df1 <- df %>%
pivot_longer( -id, names_to = "Variable", values_to = "Level" ) %>%
group_by(Variable, Level) %>%
summarise(freq = n()) %>%
mutate(percent = freq/sum(freq)*100)
df1
# # A tibble: 11 x 4
# # Groups: Variable [3]
# Variable Level freq percent
# <chr> <chr> <int> <dbl>
# 1 animal bear 1 20
# 2 animal cat 1 20
# 3 animal dog 1 20
# 4 animal <NA> 2 40
# 5 color orange 1 20
# 6 color yellow 3 60
# 7 color <NA> 1 20
# 8 shape circle 1 20
# 9 shape rectangle 1 20
# 10 shape square 2 40
# 11 shape triangle 1 20
然后我们可以使用组摘要对其进行扩充(并重新排序):
df1 %>%
group_by(Variable) %>%
summarize(Level = "total", across(freq:percent, sum)) %>%
bind_rows(df1) %>%
arrange(Variable, !is.na(Level), Level == "total", Level) %>%
mutate(Variable = ifelse(duplicated(Variable), NA, Variable))
# # A tibble: 14 x 4
# Variable Level freq percent
# <chr> <chr> <int> <dbl>
# 1 animal <NA> 2 40
# 2 <NA> bear 1 20
# 3 <NA> cat 1 20
# 4 <NA> dog 1 20
# 5 <NA> total 5 100
# 6 color <NA> 1 20
# 7 <NA> orange 1 20
# 8 <NA> yellow 3 60
# 9 <NA> total 5 100
# 10 shape circle 1 20
# 11 <NA> rectangle 1 20
# 12 <NA> square 2 40
# 13 <NA> triangle 1 20
# 14 <NA> total 5 100
【讨论】:
以上是关于为堆叠频率表中的每个组添加一列总 n的主要内容,如果未能解决你的问题,请参考以下文章
R语言ggplot2可视化:可视化堆叠的直方图在bin中的每个分组部分添加数值标签为堆叠直方图中的每个分组部分添加数值标签
在oracle中怎么对一张表中的列进行循环,比如说,我执行第一个字段,再执行第二个。。。一直到N