使用 dplyr 总结条件

Posted 2023-02-14

技术标签:

【中文标题】使用 dplyr 总结条件【英文标题】：Using dplyr summarise with conditions 【发布时间】：2019-07-16 09:55:08 【问题描述】：

我目前正在尝试应用汇总功能，以便将相关观察结果与大型数据集隔离开来。这里给出了一个简单的可重现示例：

df <- data.frame(c(1,1,1,2,2,2,3,3,3), as.logical(c(TRUE,FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,FALSE)),
                 as.numeric(c(0,5,0,0,0,0,7,0,7)))
colnames(df) <- c("ID", "Status", "Price")

  ID Status Price
1  1   TRUE     0
2  1  FALSE     5
3  1   TRUE     0
4  2   TRUE     0
5  2   TRUE     0
6  2   TRUE     0
7  3  FALSE     7
8  3   TRUE     0
9  3  FALSE     7

我想通过观察对表格进行排序，只有当所有三个观察结果都为 TRUE（计算出来）时才获得状态 TRUE，然后想要获得与状态相对应的价格（即 5 表示观察 1 为 FALSE，0 表示观察 2 为真，观察 7 为假）。

来自Summarize with conditions in dplyr 我发现我可以 - 就像往常一样 - 在方括号中指定条件。到目前为止，我的代码如下所示：

library(dplyr)
result <- df %>%
  group_by(ID) %>%
  summarize(Status = all(Status), Test = ifelse(all(Status) == TRUE,
 first(Price[Status == TRUE]), first(Price[Status == FALSE]))) 

# This is what I get: 
# A tibble: 3 x 3
     ID Status  Test
  <dbl> <lgl>  <dbl>
1    1. FALSE     0.
2    2. TRUE      0.
3    3. FALSE     7.

但正如您所见，对于 ID = 1，它给出的价格不正确。我一直在尝试这个，所以我会很感激任何关于我哪里出错的提示。

【问题讨论】：

【参考方案1】：

我们可以将all(Status) 保留为summarise 中的第二个参数（或更改列名），也可以使用if/else 来完成，因为逻辑似乎根据@ 是否返回单个TRUE/FALSE 'Status' 的 987654324@ 是否为 TRUE

df %>%
   group_by(ID) %>% 
   summarise( Test = if(all(Status)) first(Price[Status]) else 
                   first(Price[!Status]), Status = all(Status))
# A tibble: 3 x 3
#     ID  Test Status
#   <dbl> <dbl> <lgl> 
#1     1     5 FALSE 
#2     2     0 TRUE  
#3     3     7 FALSE

注意：最好不要使用长度不等的ifelse 作为其参数

【讨论】：

【参考方案2】：

可以：

df %>%
  group_by(ID) %>%
  mutate(status = Status) %>%
  summarise(
    Status = all(Status),
    Test = ifelse(Status == TRUE,
                  first(Price),
                  first(Price[status == FALSE]))
  )

输出：

# A tibble: 3 x 3
     ID Status  Test
  <dbl> <lgl>  <dbl>
1     1 FALSE      5
2     2 TRUE       0
3     3 FALSE      7

问题是您想将Status 用于Test 列，而您已经对其进行了修改，使其不再包含原始值。

之前复制一份（我已经保存在status），在上面执行ifelse就可以了。

【讨论】：

以上是关于使用 dplyr 总结条件的主要内容，如果未能解决你的问题，请参考以下文章