在循环中使用胶水和 dplyr 获取关卡名称
Posted
技术标签:
【中文标题】在循环中使用胶水和 dplyr 获取关卡名称【英文标题】:Get level names using glue and dplyr in a loop在循环中使用胶水和 dplyr 获取关卡名称
【发布时间】:2019-08-29 03:11:04
【相关技术】:@tags@
【问题描述】:
我正在尝试使用 dplyr 从表中获取级别名称并在循环中粘合(我使用循环是因为我获得了大量变量来获取分组表和单个表),我在下面展示了一个示例:
library(dplyr)
library(glue)
var=c( "vs", "am")
for(i in var)
bd=mtcars%>%
group_by(carb) %>%
count_(i) %>%
mutate(descripcion = glue("carb number:carb in: i with freq: n,"))
print(bd)
print(bd$descripcion)
我的结果:
组:碳水化合物 [6]
carb vs n descripcion
<dbl> <dbl> <int> <chr>
1 1 1 7 carb number:1 in: vs with freq: 7,
2 2 0 5 carb number:2 in: vs with freq: 5,
3 2 1 5 carb number:2 in: vs with freq: 5,
4 3 0 3 carb number:3 in: vs with freq: 3,
5 4 0 8 carb number:4 in: vs with freq: 8,
6 4 1 2 carb number:4 in: vs with freq: 2,
7 6 0 1 carb number:6 in: vs with freq: 1,
8 8 0 1 carb number:8 in: vs with freq: 1,
[1] "carb number:1 in: vs with freq: 7," "carb number:2 in: vs with freq: 5,"
[3] "carb number:2 in: vs with freq: 5," "carb number:3 in: vs with freq: 3,"
[5] "carb number:4 in: vs with freq: 8," "carb number:4 in: vs with freq: 2,"
[7] "carb number:6 in: vs with freq: 1," "carb number:8 in: vs with freq: 1,"
# A tibble: 9 x 4
# Groups: carb [6]
carb am n descripcion
<dbl> <dbl> <int> <chr>
1 1 0 3 carb number:1 in: am with freq: 3,
2 1 1 4 carb number:1 in: am with freq: 4,
3 2 0 6 carb number:2 in: am with freq: 6,
4 2 1 4 carb number:2 in: am with freq: 4,
5 3 0 3 carb number:3 in: am with freq: 3,
6 4 0 7 carb number:4 in: am with freq: 7,
7 4 1 3 carb number:4 in: am with freq: 3,
8 6 1 1 carb number:6 in: am with freq: 1,
9 8 1 1 carb number:8 in: am with freq: 1,
[1] "carb number:1 in: am with freq: 3," "carb number:1 in: am with freq: 4,"
[3] "carb number:2 in: am with freq: 6," "carb number:2 in: am with freq: 4,"
[5] "carb number:3 in: am with freq: 3," "carb number:4 in: am with freq: 7,"
[7] "carb number:4 in: am with freq: 3," "carb number:6 in: am with freq: 1,"
[9] "carb number:8 in: am with freq: 1,"
我的问题是在这个例子中我无法从 vs 和 am 变量中获取级别名称。 我的目标是获取按碳水化合物分组的单个表格:
[1] "carb number:1 in: vs 1 with freq: 7," "carb number:2 in: vs 0 with freq: 5,"
[3] "carb number:2 in: vs 1 with freq: 5," "carb number:3 in: vs 0 with freq: 3,"
[5] "carb number:4 in: vs 0 with freq: 8," "carb number:4 in: vs 1 with freq: 2,"
[7] "carb number:6 in: vs 0 with freq: 1," "carb number:8 in: vs 0 with freq: 1,"
【问题讨论】:
【参考方案1】:
我们可以使用paste0
,因为它是矢量化的。
library(dplyr)
mtcars%>%
count(carb, vs) %>%
mutate(description = paste0("carb number: ",carb, " in: vs ", vs,
" with freq: ", n))
# carb vs n description
# <dbl> <dbl> <int> <chr>
#1 1 1 7 carb number: 1 in: vs 1 with freq: 7
#2 2 0 5 carb number: 2 in: vs 0 with freq: 5
#3 2 1 5 carb number: 2 in: vs 1 with freq: 5
#4 3 0 3 carb number: 3 in: vs 0 with freq: 3
#5 4 0 8 carb number: 4 in: vs 0 with freq: 8
#6 4 1 2 carb number: 4 in: vs 1 with freq: 2
#7 6 0 1 carb number: 6 in: vs 0 with freq: 1
#8 8 0 1 carb number: 8 in: vs 0 with freq: 1
要使用glue
,我们需要使用来自purrr
的map
的某个版本
library(dplyr)
library(glue)
library(purrr)
mtcars%>%
count(carb, vs) %>%
mutate(description = pmap_chr(list(carb, vs, n), function(a, b, c)
glue("carb number: ",a, " in: vs ", b, " with freq: ", c)))
编辑
如果我们要计算不同的列,我们可以将变量转换为符号
var = c("vs", "am")
library(rlang)
map(var, function(x) mtcars%>%
count(carb, !!sym(x)) %>%
mutate(description = paste0("carb number: ",carb, " in: ",
x, " " , !!sym(x)," with freq: ", n)))
#[[1]]
# A tibble: 8 x 4
# carb vs n description
# <dbl> <dbl> <int> <chr>
#1 1 1 7 carb number: 1 in: vs 1 with freq: 7
#2 2 0 5 carb number: 2 in: vs 0 with freq: 5
#3 2 1 5 carb number: 2 in: vs 1 with freq: 5
#4 3 0 3 carb number: 3 in: vs 0 with freq: 3
#5 4 0 8 carb number: 4 in: vs 0 with freq: 8
#6 4 1 2 carb number: 4 in: vs 1 with freq: 2
#7 6 0 1 carb number: 6 in: vs 0 with freq: 1
#8 8 0 1 carb number: 8 in: vs 0 with freq: 1
#[[2]]
# A tibble: 9 x 4
# carb am n description
# <dbl> <dbl> <int> <chr>
#1 1 0 3 carb number: 1 in: am 0 with freq: 3
#2 1 1 4 carb number: 1 in: am 1 with freq: 4
#3 2 0 6 carb number: 2 in: am 0 with freq: 6
#4 2 1 4 carb number: 2 in: am 1 with freq: 4
#5 3 0 3 carb number: 3 in: am 0 with freq: 3
#6 4 0 7 carb number: 4 in: am 0 with freq: 7
#7 4 1 3 carb number: 4 in: am 1 with freq: 3
#8 6 1 1 carb number: 6 in: am 1 with freq: 1
#9 8 1 1 carb number: 8 in: am 1 with freq: 1
或者用for
循环
for (i in var)
print(mtcars%>%
count(carb, !!sym(i)) %>%
mutate(description = paste0("carb number: ",carb, " in: ", i, " " ,
!!sym(i), " with freq: ", n)))
【讨论】:
@Rodrigo 为什么需要使用循环?如图所示paste0
没有任何循环。或者如果你想使用glue
,那么你可以使用pmap
。
我有大量的变量用于构建分组表,我只展示一个例子。
无论你的表有多大,只要它遵循与mtcars
相同的结构,这应该可以工作。另请注意,我没有使用group_by
,而是将该变量包含在count
中,以便它自动计算carb
和vs
中的每个唯一变量。可以试一次吗?
让我们continue this discussion in chat.【参考方案2】:
我们可以直接在列上使用glue_data
而不需要任何循环
library(glue)
library(dplyr)
mtcars %>%
count(carb, vs) %>%
mutate(description = glue_data(., "carb number: carb in: vs vs with freq: n"))
# A tibble: 8 x 4
# carb vs n description
# <dbl> <dbl> <int> <S3: glue>
#1 1 1 7 carb number: 1 in: vs 1 with freq: 7
#2 2 0 5 carb number: 2 in: vs 0 with freq: 5
#3 2 1 5 carb number: 2 in: vs 1 with freq: 5
#4 3 0 3 carb number: 3 in: vs 0 with freq: 3
#5 4 0 8 carb number: 4 in: vs 0 with freq: 8
#6 4 1 2 carb number: 4 in: vs 1 with freq: 2
#7 6 0 1 carb number: 6 in: vs 0 with freq: 1
#8 8 0 1 carb number: 8 in: vs 0 with freq: 1
如果我们有不同的分组变量,则使用count
中的sym
转换为symbol
并计算(!!
),并在glue_data
中更改为.x
用于'var' 部分
library(rlang)
library(purrr)
map(var, ~ mtcars %>%
count(carb, !! sym(.x)) %>%
mutate(description = glue_data(.,
"carb number: carb in: vs .x with freq: n")))
【讨论】:
以上是关于在循环中使用胶水和 dplyr 获取关卡名称的主要内容,如果未能解决你的问题,请参考以下文章
如何使用 AWS 胶水获取存储在 s3 中的模式或已处理的嵌套 json 文件压缩(gzip)?