按条件过滤行，包括相等并使用 R 获取最大值

Posted 2023-02-14

技术标签:

【中文标题】按条件过滤行，包括相等并使用 R 获取最大值【英文标题】：Filter rows by conditions including equal and get maximum values using R 【发布时间】：2022-01-17 14:13:29 【问题描述】：

假设我有一个数据框df，如下所示：

df <- structure(list(date = c("2021-10-1", "2021-10-2", "2021-10-3", 
"2021-10-4", "2021-10-5", "2021-10-6", "2021-10-7", "2021-10-8", 
"2021-10-9"), value = c(190.3, 174.9, 163.2, 168.4, 168.6, 168.2, 
163.5, 161.6, 172.9), type = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L)), class = "data.frame", row.names = c(NA, -9L))

我尝试过滤满足两个条件（或条件，而不是和）的行：

type==2

type==1

max(date)

我的试用码：

df$date <- as.Date(df$date)

方法一：

df[type==2 | date==max(df[type==1]$date)]

输出：

Error in `[.data.frame`(df, type == 2 | date == max(df[type == 1]$date)) : 
object 'type' not found

方法二：

df %>%
  filter(type==2|date==max(df[type==1]$date))

输出：

Error: Problem with `filter()` input `..1`.
i Input `..1` is `type == 3 | date == max(df[type == 2]$date)`.
x undefined columns selected

但是当我在来自this link 的代码geom_point(data=df[type==3 | date==max(df[type==2]$date)],size=2, aes(shape=type)) 中使用时，它可以解决。

预期结果：

我想知道如何使用上述两种方法正确过滤？谢谢。

【问题讨论】：

【参考方案1】：

请看看这是否会产生预期的输出。

library(dplyr)

df2 <- df %>%
  mutate(date = as.Date(date)) %>%
  filter(type == 2 | (type == 1 & date == max(date[type == 1])))
df2
#         date value type
# 1 2021-10-05 168.6    1
# 2 2021-10-06 168.2    2
# 3 2021-10-07 163.5    2
# 4 2021-10-08 161.6    2
# 5 2021-10-09 172.9    2

【讨论】：

是的，它有效，我知道我们需要使用 type == 1 两次来满足第二个条件。非常感谢，第一种方法？ df[type == 2 | (type == 1 & date == max(date[type == 1]))] 似乎不起作用并产生相同的错误。 @ahbon 对于您的第一种方法，您需要将数据框锚定在每个变量上，并在末尾添加逗号以指定行，即df[df$type == 2 | (df$type == 1 & df$date == max(df$date[df$type == 1])),] 好的，谢谢你的帮助，但我不明白为什么geom_point(data=df[type==3 | date==max(df[type==2]$date)],size=2, aes(shape=type))，为什么不需要这样做。

以上是关于按条件过滤行，包括相等并使用 R 获取最大值的主要内容，如果未能解决你的问题，请参考以下文章

Pandas Pivot Table：按条件过滤时出错

认真对待每一天