左连接并在R中及时选择下一个观察

Posted

技术标签:

【中文标题】左连接并在R中及时选择下一个观察【英文标题】:Left join and select the next observation in time in R 【发布时间】:2021-09-24 18:40:09 【问题描述】:

假设我有两个数据框

df <- data.frame(ID=c("Ana", "Lola", "Ana"),
             Date=c("2020-06-06", "2020-06- 06", "2020-06- 07"),
             meat=c("fish", "poultry", "poultry"),
             time_ordered=c("2020-06-06 12:24:39", "2020-06-06 12:34:36", "2020-06-07 12:24:39"))

df2 <- data.frame(ID=c("Ana","Ana",  "Lola", "Ana"),
             Date=c("2020-06-06", "2020-06-06",  "2020-06- 06", "2020-06- 07"),
             meat=c("fish", "fish", "poultry", "poultry"),
             time_received=c("2020-06-06 12:24:40", "2020-06-06 12:26:49",  "2020-06-07 12:36:39", "2020-06-07 13:04:39"))

假设我想在 ID 和肉上加入这两个数据框。 然后,对于给定的观察,我想将 time_ordered 与它之后的第一个 time_received 匹配。 例如,我应该有一行“ID = Ana, Data= 2020-06-06, Meat = fish, time_ordered = 2020-06-06 12:24:39, time received = 2020-06-06 12:24: 40 英寸。

所以我不会将 time_received "2020-06-06 12:26:49" 与任何内容匹配。 事实上,对于每个(ID,Meat,time_observed),我想唯一匹配到(ID,Meat,min(time_received)> time_observed)

非常感谢您!

【问题讨论】:

【参考方案1】:

加入dfdf2IDmeatDate,仅保留time_received &gt; time_orderedtime_received排列数据的行,并仅保留唯一行。

library(dplyr)
library(lubridate)

df %>%
  left_join(df2, by = c('ID', 'meat', 'Date')) %>%
  mutate(Date = ymd(Date), 
         across(c(time_ordered, time_received), ymd_hms)) %>%
  filter(time_received >  time_ordered) %>%
  arrange(ID, Date, meat, time_received) %>%
  distinct(ID, Date, meat, .keep_all = TRUE)

#    ID       Date    meat        time_ordered       time_received
#1  Ana 2020-06-06    fish 2020-06-06 12:24:39 2020-06-06 12:24:40
#2  Ana 2020-06-07 poultry 2020-06-07 12:24:39 2020-06-07 13:04:39
#3 Lola 2020-06-06 poultry 2020-06-06 12:34:36 2020-06-07 12:36:39

【讨论】:

以上是关于左连接并在R中及时选择下一个观察的主要内容,如果未能解决你的问题,请参考以下文章

Linq to SQL 使用 Lambda 语法进行左外连接并在 2 列上连接(复合连接键)

在两个表mysql上左连接时获取最后修改日期

左连接 3 个表并在空单元格上显示真假

mysql:左连接但从连接表中选择一个特定项目

左连接产生内连接结果 - PostgreSQL

如何在左连接中选择单个记录