将reduce应用于R数据框中一列的每一行，包含一个列表

Posted 2023-03-16

技术标签:

【中文标题】将reduce应用于R数据框中一列的每一行，包含一个列表【英文标题】：apply reduce to each row, containing a list, of a column in R dataframe 【发布时间】：2021-10-17 12:12:18 【问题描述】：

我已经包含了 20 行数据框：

structure(list(countyfips = c(1003, 1003, 1003, 1003, 1003, 1005, 
1005, 1005, 1005, 1005, 1007, 1007, 1007, 1007, 1007, 1009, 1009, 
1009, 1009, 1009), engagement = c("-.186", "-.231", "-.0681", 
"-.38", "-.267", "-.0148", ".00322", ".0804", "-.478", "-.83", 
"-.0532", "-.162", "-.0185", "-.883", "-.909", ".0278", "-.537", 
"-.691", "-.972", "-.981")), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

我使用以下表达式进行了分组：

math_stu_online_engage %>%
  group_by(countyfips) %>% summarise(monthly_engagement = list(engagement))

我现在想将以下 Reduce 函数应用于每月参与度列的每个列表/行：

mutate(acc_perc_change = Reduce(function(x, y) x + x * y, monthly_engagement))

但我收到此错误：

Error: Problem with `mutate()` input `acc_perc_change`.
x non-numeric argument to binary operator

我在这里做错了什么？

非常感谢！

【问题讨论】：

【参考方案1】：

这是 base R 中的解决方案：

do.call(rbind, lapply(unique(df$countyfips), function(a) 
  tmp <- subset(df, countyfips == a)
  tmp <- transform(tmp, engagement = as.numeric(engagement))
  tmp$acc_perc_change <- Reduce(function(x, y) 
    x + x * y
  , tmp$engagement)
  tmp
))

这是一个简化的tidyverse 解决方案：

library(purrr)

df %>%
  group_by(countyfips) %>%
  summarise(engagement = as.numeric(engagement), 
            acc_perc_change = reduce(engagement, ~ .x + .x * .y))

【讨论】：

我很高兴，很高兴它有帮助。 reduce或Reduce的输出长度实际上是1。但是，如果你使用accumulate 或Reduce 设置accumulate = TRUE，这与只打印中间结果的操作基本相同，它与输入向量的长度相同。【参考方案2】：

这是一个character 列。所以，我们需要先将其转换为numeric。其次，reduce/Reduce 输出的长度为 5，而行数仅为 4。因此，我们可能需要将其包装在 list

library(dplyr)
library(purrr)
df1 %>% 
    group_by(countyfips) %>%
    summarise(monthly_engagement = as.numeric(engagement)) %>% 
    mutate(acc_perc_change = 
          reduce(monthly_engagement, ~ .x + .x * .y)) %>%
    ungroup

-输出

# A tibble: 20 x 3
   countyfips monthly_engagement acc_perc_change
        <dbl>              <dbl>           <dbl>
 1       1003           -0.186       -0.0606    
 2       1003           -0.231       -0.0606    
 3       1003           -0.0681      -0.0606    
 4       1003           -0.38        -0.0606    
 5       1003           -0.267       -0.0606    
 6       1005           -0.0148      -0.00142   
 7       1005            0.00322     -0.00142   
 8       1005            0.0804      -0.00142   
 9       1005           -0.478       -0.00142   
10       1005           -0.83        -0.00142   
11       1007           -0.0532      -0.000466  
12       1007           -0.162       -0.000466  
13       1007           -0.0185      -0.000466  
14       1007           -0.883       -0.000466  
15       1007           -0.909       -0.000466  
16       1009            0.0278       0.00000212
17       1009           -0.537        0.00000212
18       1009           -0.691        0.00000212
19       1009           -0.972        0.00000212
20       1009           -0.981        0.00000212

【讨论】：

非常感谢您的回复！我正在获取 acc_perc_change 列的所有 NA，并带有以下警告消息： mask$eval_all_summarise(quo) 中的警告消息：“强制引入的 NA”。关于为什么的任何想法？ @Erin 在你的例子中，我没有得到任何警告或不适用

以上是关于将reduce应用于R数据框中一列的每一行，包含一个列表的主要内容，如果未能解决你的问题，请参考以下文章