在循环中使用胶水和 dplyr 获取关卡名称

Posted

技术标签:

【中文标题】在循环中使用胶水和 dplyr 获取关卡名称
【英文标题】:Get level names using glue and dplyr in a loop在循环中使用胶水和 dplyr 获取关卡名称
【发布时间】:2019-08-29 03:11:04
【相关技术】:@tags@
【问题描述】:

我正在尝试使用 dplyr 从表中获取级别名称并在循环中粘合(我使用循环是因为我获得了大量变量来获取分组表和单个表),我在下面展示了一个示例:

library(dplyr)
library(glue)
var=c( "vs", "am")
for(i in var) 
bd=mtcars%>%
group_by(carb) %>%
 count_(i) %>%
      mutate(descripcion = glue("carb number:carb in: i with freq: n,"))
print(bd) 
print(bd$descripcion)

我的结果:

组:碳水化合物 [6]

   carb    vs     n descripcion                       
  <dbl> <dbl> <int> <chr>                             
1     1     1     7 carb number:1 in: vs with freq: 7,
2     2     0     5 carb number:2 in: vs with freq: 5,
3     2     1     5 carb number:2 in: vs with freq: 5,
4     3     0     3 carb number:3 in: vs with freq: 3,
5     4     0     8 carb number:4 in: vs with freq: 8,
6     4     1     2 carb number:4 in: vs with freq: 2,
7     6     0     1 carb number:6 in: vs with freq: 1,
8     8     0     1 carb number:8 in: vs with freq: 1,
[1] "carb number:1 in: vs with freq: 7," "carb number:2 in: vs with freq: 5,"
[3] "carb number:2 in: vs with freq: 5," "carb number:3 in: vs with freq: 3,"
[5] "carb number:4 in: vs with freq: 8," "carb number:4 in: vs with freq: 2,"
[7] "carb number:6 in: vs with freq: 1," "carb number:8 in: vs with freq: 1,"
# A tibble: 9 x 4
# Groups:   carb [6]
   carb    am     n descripcion                       
  <dbl> <dbl> <int> <chr>                             
1     1     0     3 carb number:1 in: am with freq: 3,
2     1     1     4 carb number:1 in: am with freq: 4,
3     2     0     6 carb number:2 in: am with freq: 6,
4     2     1     4 carb number:2 in: am with freq: 4,
5     3     0     3 carb number:3 in: am with freq: 3,
6     4     0     7 carb number:4 in: am with freq: 7,
7     4     1     3 carb number:4 in: am with freq: 3,
8     6     1     1 carb number:6 in: am with freq: 1,
9     8     1     1 carb number:8 in: am with freq: 1,
[1] "carb number:1 in: am with freq: 3," "carb number:1 in: am with freq: 4,"
[3] "carb number:2 in: am with freq: 6," "carb number:2 in: am with freq: 4,"
[5] "carb number:3 in: am with freq: 3," "carb number:4 in: am with freq: 7,"
[7] "carb number:4 in: am with freq: 3," "carb number:6 in: am with freq: 1,"
[9] "carb number:8 in: am with freq: 1,"

我的问题是在这个例子中我无法从 vs 和 am 变量中获取级别名称。 我的目标是获取按碳水化合物分组的单个表格:

[1] "carb number:1 in:  vs 1 with freq: 7," "carb number:2 in:  vs 0 with freq: 5,"
   [3] "carb number:2 in:  vs 1 with freq: 5," "carb number:3 in:  vs 0 with freq: 3,"
   [5] "carb number:4 in:  vs 0 with freq: 8," "carb number:4 in:  vs 1 with freq: 2,"
   [7] "carb number:6 in:  vs 0 with freq: 1," "carb number:8 in:  vs 0 with freq: 1,"

【问题讨论】:


【参考方案1】:

我们可以使用paste0,因为它是矢量化的。

library(dplyr)

mtcars%>% 
   count(carb, vs) %>%
   mutate(description = paste0("carb number: ",carb, " in: vs ", vs, 
                                " with freq: ", n))


#   carb    vs     n description                         
#  <dbl> <dbl> <int> <chr>                               
#1     1     1     7 carb number: 1 in: vs 1 with freq: 7
#2     2     0     5 carb number: 2 in: vs 0 with freq: 5
#3     2     1     5 carb number: 2 in: vs 1 with freq: 5
#4     3     0     3 carb number: 3 in: vs 0 with freq: 3
#5     4     0     8 carb number: 4 in: vs 0 with freq: 8
#6     4     1     2 carb number: 4 in: vs 1 with freq: 2
#7     6     0     1 carb number: 6 in: vs 0 with freq: 1
#8     8     0     1 carb number: 8 in: vs 0 with freq: 1

要使用glue,我们需要使用来自purrrmap 的某个版本

library(dplyr)
library(glue)
library(purrr)

mtcars%>% 
   count(carb, vs) %>%
   mutate(description = pmap_chr(list(carb, vs, n), function(a, b, c) 
             glue("carb number: ",a, " in: vs ", b, " with freq: ", c)))

编辑

如果我们要计算不同的列,我们可以将变量转换为符号

var = c("vs", "am")
library(rlang)

map(var, function(x) mtcars%>% 
                       count(carb, !!sym(x)) %>%
                       mutate(description = paste0("carb number: ",carb, " in: ", 
                        x, " " , !!sym(x)," with freq: ", n)))

#[[1]]
# A tibble: 8 x 4
#   carb    vs     n description                         
#  <dbl> <dbl> <int> <chr>                               
#1     1     1     7 carb number: 1 in: vs 1 with freq: 7
#2     2     0     5 carb number: 2 in: vs 0 with freq: 5
#3     2     1     5 carb number: 2 in: vs 1 with freq: 5
#4     3     0     3 carb number: 3 in: vs 0 with freq: 3
#5     4     0     8 carb number: 4 in: vs 0 with freq: 8
#6     4     1     2 carb number: 4 in: vs 1 with freq: 2
#7     6     0     1 carb number: 6 in: vs 0 with freq: 1
#8     8     0     1 carb number: 8 in: vs 0 with freq: 1

#[[2]]
# A tibble: 9 x 4
#   carb    am     n description                         
#  <dbl> <dbl> <int> <chr>                               
#1     1     0     3 carb number: 1 in: am 0 with freq: 3
#2     1     1     4 carb number: 1 in: am 1 with freq: 4
#3     2     0     6 carb number: 2 in: am 0 with freq: 6
#4     2     1     4 carb number: 2 in: am 1 with freq: 4
#5     3     0     3 carb number: 3 in: am 0 with freq: 3
#6     4     0     7 carb number: 4 in: am 0 with freq: 7
#7     4     1     3 carb number: 4 in: am 1 with freq: 3
#8     6     1     1 carb number: 6 in: am 1 with freq: 1
#9     8     1     1 carb number: 8 in: am 1 with freq: 1

或者用for循环

for (i in var) 
   print(mtcars%>% 
           count(carb, !!sym(i)) %>%
           mutate(description = paste0("carb number: ",carb, " in: ", i, " " , 
                                  !!sym(i), " with freq: ", n)))

【讨论】:

@Rodrigo 为什么需要使用循环?如图所示paste0 没有任何循环。或者如果你想使用glue,那么你可以使用pmap 我有大量的变量用于构建分组表,我只展示一个例子。 无论你的表有多大,只要它遵循与mtcars 相同的结构,这应该可以工作。另请注意,我没有使用group_by,而是将该变量包含在count 中,以便它自动计算carbvs 中的每个唯一变量。可以试一次吗? 让我们continue this discussion in chat.【参考方案2】:

我们可以直接在列上使用glue_data而不需要任何循环

library(glue)
library(dplyr)
mtcars %>% 
 count(carb, vs) %>%
 mutate(description = glue_data(., "carb number: carb in: vs vs with freq: n"))
# A tibble: 8 x 4
#   carb    vs     n description                         
#  <dbl> <dbl> <int> <S3: glue>                          
#1     1     1     7 carb number: 1 in: vs 1 with freq: 7
#2     2     0     5 carb number: 2 in: vs 0 with freq: 5
#3     2     1     5 carb number: 2 in: vs 1 with freq: 5
#4     3     0     3 carb number: 3 in: vs 0 with freq: 3
#5     4     0     8 carb number: 4 in: vs 0 with freq: 8
#6     4     1     2 carb number: 4 in: vs 1 with freq: 2
#7     6     0     1 carb number: 6 in: vs 0 with freq: 1
#8     8     0     1 carb number: 8 in: vs 0 with freq: 1

如果我们有不同的分组变量,则使用count 中的sym 转换为symbol 并计算(!!),并在glue_data 中更改为.x 用于'var' 部分

library(rlang)
library(purrr)
map(var, ~ mtcars %>%
             count(carb, !! sym(.x)) %>%
              mutate(description = glue_data(., 
                  "carb number: carb in: vs .x with freq: n")))

【讨论】:

以上是关于在循环中使用胶水和 dplyr 获取关卡名称的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 AWS 胶水获取存储在 s3 中的模式或已处理的嵌套 json 文件压缩(gzip)?

R 根据多个条件获取行 - 使用 dplyr 和 reshape2

在用户定义的函数中使用胶水进行变异

合并相同名称并获取支持数据的总和 - Reprex

使用字典来实现ui界面的关卡功能

如何从数据库中获取值以为循环形式的输入创建动态名称?