如何从两个合并的数据帧中选择完成之前和之后的特定时间间隔?

Posted

技术标签:

【中文标题】如何从两个合并的数据帧中选择完成之前和之后的特定时间间隔?【英文标题】:How do I select certain time interval before and after complete.cases from two merged dataframes? 【发布时间】:2021-11-27 10:17:36 【问题描述】:

我已经按时间合并了两个数据框:

Mergeboth <- merge(test, deeptest, by.x="date.time", by.y="Bottom.Start", all=TRUE)

test 数据帧非常大(超过 100 万个数据点),而 deeptest 只有大约 100 个。使用complete.cases() 检查它正确合并了两个数据帧。现在数据框已合并,我想在 complete.cases 之前和之后选择某个时间间隔(假设为 2 分钟以便于输出 - 但我的实际目标是 12 小时),为所有这些分配一个新数字新选择的行并将其输出到新的数据框中:df_new

所以我当前的数据框如下所示:

MergeBoth()
 Bottom.Start           Dive Max.Depth Depth
 1: 2015-12-22 01:07:00   NA        NA 311.0
 2: 2015-12-22 01:07:10   NA        NA 308.5
 3: 2015-12-22 01:07:20   NA        NA 307.0
 4: 2015-12-22 01:07:30   NA        NA 306.5
 5: 2015-12-22 01:07:40   NA        NA 305.5
 6: 2015-12-22 01:07:50   NA        NA 308.5
 7: 2015-12-22 01:08:00   NA        NA 318.5
 8: 2015-12-22 01:08:10   NA        NA 331.0
 9: 2015-12-22 01:08:20   NA        NA 345.5
10: 2015-12-22 01:08:30   NA        NA 361.0
11: 2015-12-22 01:08:40   NA        NA 376.5
12: 2015-12-22 01:08:50   NA        NA 392.0
13: 2015-12-22 01:09:00   NA        NA 408.5
14: 2015-12-22 01:09:10   NA        NA 425.5
15: 2015-12-22 01:09:20   NA        NA 442.5
16: 2015-12-22 01:09:30   NA        NA 459.0
17: 2015-12-22 01:09:40   NA        NA 475.0
18: 2015-12-22 01:09:50   NA        NA 491.0
19: 2015-12-22 01:10:00 1238       727 508.0
20: 2015-12-22 01:10:10   NA        NA 523.5
21: 2015-12-22 01:10:20   NA        NA 540.5
22: 2015-12-22 01:10:30   NA        NA 556.5
23: 2015-12-22 01:10:40   NA        NA 572.5
24: 2015-12-22 01:10:50   NA        NA 581.0
25: 2015-12-22 01:11:00   NA        NA 583.5
26: 2015-12-22 01:11:10   NA        NA 582.0
27: 2015-12-22 01:11:20   NA        NA 581.0
28: 2015-12-22 01:11:30   NA        NA 581.0
29: 2015-12-22 01:11:40   NA        NA 581.0
30: 2015-12-22 01:11:50   NA        NA 581.0
31: 2015-12-22 01:12:00   NA        NA 582.0
32: 2015-12-22 01:12:10   NA        NA 581.5
33: 2015-12-22 01:12:20   NA        NA 581.5
34: 2015-12-22 01:12:30   NA        NA 592.5
35: 2015-12-22 01:12:40   NA        NA 606.5
36: 2015-12-22 01:12:50   NA        NA 621.5
37: 2015-12-22 01:13:00   NA        NA 637.5
38: 2015-12-22 01:13:10   NA        NA 655.0
39: 2015-12-23 07:17:00   NA        NA 863.0
40: 2015-12-23 07:17:10   NA        NA 863.5
41: 2015-12-23 07:17:20   NA        NA 865.0
42: 2015-12-23 07:17:30   NA        NA 866.0
43: 2015-12-23 07:17:40   NA        NA 867.0
44: 2015-12-23 07:17:50   NA        NA 867.5
45: 2015-12-23 07:18:00   NA        NA 868.5
46: 2015-12-23 07:18:10   NA        NA 870.0
47: 2015-12-23 07:18:20   NA        NA 870.5
48: 2015-12-23 07:18:30   NA        NA 871.0
49: 2015-12-23 07:18:40   NA        NA 872.0
50: 2015-12-23 07:18:50   NA        NA 872.0
51: 2015-12-23 07:19:00 1267       970 874.0
52: 2015-12-23 07:19:10   NA        NA 875.0
53: 2015-12-23 07:19:20   NA        NA 875.0
54: 2015-12-23 07:19:30   NA        NA 876.0
55: 2015-12-23 07:19:40   NA        NA 876.5
56: 2015-12-23 07:19:50   NA        NA 876.0
57: 2015-12-23 07:20:00   NA        NA 878.0
58: 2015-12-23 07:20:10   NA        NA 876.0
59: 2015-12-23 07:20:20   NA        NA 875.5
60: 2015-12-23 07:20:30   NA        NA 875.5
61: 2015-12-23 07:20:40   NA        NA 874.0
62: 2015-12-23 07:20:50   NA        NA 872.5
63: 2015-12-23 07:21:00   NA        NA 870.5
64: 2015-12-23 07:21:10   NA        NA 867.0
65: 2015-12-23 07:21:20   NA        NA 863.5
66: 2015-12-23 07:21:30   NA        NA 860.5
67: 2015-12-23 07:21:40   NA        NA 859.0
68: 2015-12-23 07:21:50   NA        NA 861.0
69: 2015-12-23 07:22:00   NA        NA 864.5
70: 2015-12-23 07:22:10   NA        NA 868.5
71: 2015-12-23 07:22:20   NA        NA 874.5
72: 2015-12-23 07:22:30   NA        NA 882.0
73: 2015-12-23 07:22:40   NA        NA 894.0
74: 2015-12-23 07:22:50   NA        NA 907.0
75: 2015-12-23 07:23:00   NA        NA 922.5

我正在努力做到这一点:

df_new()
            Bottom.Start Dive Max.Depth Depth DiveNumber
 1: 2015-12-22 01:08:00   NA        NA 318.5          1
 2: 2015-12-22 01:08:10   NA        NA 331.0          1
 3: 2015-12-22 01:08:20   NA        NA 345.5          1
 4: 2015-12-22 01:08:30   NA        NA 361.0          1
 5: 2015-12-22 01:08:40   NA        NA 376.5          1
 6: 2015-12-22 01:08:50   NA        NA 392.0          1
 7: 2015-12-22 01:09:00   NA        NA 408.5          1
 8: 2015-12-22 01:09:10   NA        NA 425.5          1
 9: 2015-12-22 01:09:20   NA        NA 442.5          1
10: 2015-12-22 01:09:30   NA        NA 459.0          1
11: 2015-12-22 01:09:40   NA        NA 475.0          1
12: 2015-12-22 01:09:50   NA        NA 491.0          1
13: 2015-12-22 01:10:00 1238       727 508.0          1
14: 2015-12-22 01:10:10   NA        NA 523.5          1
15: 2015-12-22 01:10:20   NA        NA 540.5          1
16: 2015-12-22 01:10:30   NA        NA 556.5          1
17: 2015-12-22 01:10:40   NA        NA 572.5          1
18: 2015-12-22 01:10:50   NA        NA 581.0          1
19: 2015-12-22 01:11:00   NA        NA 583.5          1
20: 2015-12-22 01:11:10   NA        NA 582.0          1
21: 2015-12-22 01:11:20   NA        NA 581.0          1
22: 2015-12-22 01:11:30   NA        NA 581.0          1
23: 2015-12-22 01:11:40   NA        NA 581.0          1
24: 2015-12-22 01:11:50   NA        NA 581.0          1
25: 2015-12-22 01:12:00   NA        NA 582.0          1
26: 2015-12-23 07:17:00   NA        NA 863.0          2
27: 2015-12-23 07:17:10   NA        NA 863.5          2
28: 2015-12-23 07:17:20   NA        NA 865.0          2
29: 2015-12-23 07:17:30   NA        NA 866.0          2
30: 2015-12-23 07:17:40   NA        NA 867.0          2
31: 2015-12-23 07:17:50   NA        NA 867.5          2
32: 2015-12-23 07:18:00   NA        NA 868.5          2
33: 2015-12-23 07:18:10   NA        NA 870.0          2
34: 2015-12-23 07:18:20   NA        NA 870.5          2
35: 2015-12-23 07:18:30   NA        NA 871.0          2
36: 2015-12-23 07:18:40   NA        NA 872.0          2
37: 2015-12-23 07:18:50   NA        NA 872.0          2
38: 2015-12-23 07:19:00 1267       970 874.0          2
39: 2015-12-23 07:19:10   NA        NA 875.0          2
40: 2015-12-23 07:19:20   NA        NA 875.0          2
41: 2015-12-23 07:19:30   NA        NA 876.0          2
42: 2015-12-23 07:19:40   NA        NA 876.5          2
43: 2015-12-23 07:19:50   NA        NA 876.0          2
44: 2015-12-23 07:20:00   NA        NA 878.0          2
45: 2015-12-23 07:20:10   NA        NA 876.0          2
46: 2015-12-23 07:20:20   NA        NA 875.5          2
47: 2015-12-23 07:20:30   NA        NA 875.5          2
48: 2015-12-23 07:20:40   NA        NA 874.0          2
49: 2015-12-23 07:20:50   NA        NA 872.5          2
50: 2015-12-23 07:21:00   NA        NA 870.5          2

这是我的 dput() 用于数据框:

structure(list(Bottom.Start = structure(c(1450746420, 1450746430, 
1450746440, 1450746450, 1450746460, 1450746470, 1450746480, 1450746490, 
1450746500, 1450746510, 1450746520, 1450746530, 1450746540, 1450746550, 
1450746560, 1450746570, 1450746580, 1450746590, 1450746600, 1450746610, 
1450746620, 1450746630, 1450746640, 1450746650, 1450746660, 1450746670, 
1450746680, 1450746690, 1450746700, 1450746710, 1450746720, 1450746730, 
1450746740, 1450746750, 1450746760, 1450746770, 1450746780, 1450746790, 
1450855020, 1450855030, 1450855040, 1450855050, 1450855060, 1450855070, 
1450855080, 1450855090, 1450855100, 1450855110, 1450855120, 1450855130, 
1450855140, 1450855150, 1450855160, 1450855170, 1450855180, 1450855190, 
1450855200, 1450855210, 1450855220, 1450855230, 1450855240, 1450855250, 
1450855260, 1450855270, 1450855280, 1450855290, 1450855300, 1450855310, 
1450855320, 1450855330, 1450855340, 1450855350, 1450855360, 1450855370, 
1450855380), tzone = "UTC", class = c("POSIXct", "POSIXt")), 
    Dive = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, 1238L, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, 1267L, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA), Max.Depth = c(NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 727, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    970, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Depth = c(311, 
    308.5, 307, 306.5, 305.5, 308.5, 318.5, 331, 345.5, 361, 
    376.5, 392, 408.5, 425.5, 442.5, 459, 475, 491, 508, 523.5, 
    540.5, 556.5, 572.5, 581, 583.5, 582, 581, 581, 581, 581, 
    582, 581.5, 581.5, 592.5, 606.5, 621.5, 637.5, 655, 863, 
    863.5, 865, 866, 867, 867.5, 868.5, 870, 870.5, 871, 872, 
    872, 874, 875, 875, 876, 876.5, 876, 878, 876, 875.5, 875.5, 
    874, 872.5, 870.5, 867, 863.5, 860.5, 859, 861, 864.5, 868.5, 
    874.5, 882, 894, 907, 922.5)), row.names = c(NA, -75L), class = c("data.table", 
"data.frame"))

我尝试了lag()tail() 以及包括case_when()mutate() 在内的dplyr 参数的组合,但似乎无法获得我想要的输出。我对 R 比较陌生,感谢您的帮助!

【问题讨论】:

【参考方案1】:

我不确定合并前您的原始 data.frames 是什么样的,但这是需要考虑的事情(使用 data.table 可能相对更快)。

首先,您可以创建(或者您可能已经拥有?)一个包含完整案例的小型 data.frame。您可以枚举潜水并确定您对这些数据行感兴趣的时间范围。

然后,您可以执行非等连接并选择在确定的时间范围内具有Bottom.Start 次的数据行。

library(data.table)
library(lubridate)

df_c <- df[complete.cases(df), ]
df_c[, `:=` (DiveNumber = seq_len(.N), Start = Bottom.Start - minutes(2), End = Bottom.Start + minutes(2))]

df[df_c, .(Bottom.Start = x.Bottom.Start, Dive, Max.Depth, Depth, DiveNumber), on = .(Bottom.Start >= Start, Bottom.Start <= End)]

输出

           Bottom.Start Dive Max.Depth Depth DiveNumber
 1: 2015-12-22 01:08:00   NA        NA 318.5          1
 2: 2015-12-22 01:08:10   NA        NA 331.0          1
 3: 2015-12-22 01:08:20   NA        NA 345.5          1
 4: 2015-12-22 01:08:30   NA        NA 361.0          1
 5: 2015-12-22 01:08:40   NA        NA 376.5          1
 6: 2015-12-22 01:08:50   NA        NA 392.0          1
 7: 2015-12-22 01:09:00   NA        NA 408.5          1
 8: 2015-12-22 01:09:10   NA        NA 425.5          1
 9: 2015-12-22 01:09:20   NA        NA 442.5          1
10: 2015-12-22 01:09:30   NA        NA 459.0          1
11: 2015-12-22 01:09:40   NA        NA 475.0          1
12: 2015-12-22 01:09:50   NA        NA 491.0          1
13: 2015-12-22 01:10:00 1238       727 508.0          1
14: 2015-12-22 01:10:10   NA        NA 523.5          1
15: 2015-12-22 01:10:20   NA        NA 540.5          1
16: 2015-12-22 01:10:30   NA        NA 556.5          1
17: 2015-12-22 01:10:40   NA        NA 572.5          1
18: 2015-12-22 01:10:50   NA        NA 581.0          1
19: 2015-12-22 01:11:00   NA        NA 583.5          1
20: 2015-12-22 01:11:10   NA        NA 582.0          1
21: 2015-12-22 01:11:20   NA        NA 581.0          1
22: 2015-12-22 01:11:30   NA        NA 581.0          1
23: 2015-12-22 01:11:40   NA        NA 581.0          1
24: 2015-12-22 01:11:50   NA        NA 581.0          1
25: 2015-12-22 01:12:00   NA        NA 582.0          1
26: 2015-12-23 07:17:00   NA        NA 863.0          2
27: 2015-12-23 07:17:10   NA        NA 863.5          2
28: 2015-12-23 07:17:20   NA        NA 865.0          2
29: 2015-12-23 07:17:30   NA        NA 866.0          2
30: 2015-12-23 07:17:40   NA        NA 867.0          2
31: 2015-12-23 07:17:50   NA        NA 867.5          2
32: 2015-12-23 07:18:00   NA        NA 868.5          2
33: 2015-12-23 07:18:10   NA        NA 870.0          2
34: 2015-12-23 07:18:20   NA        NA 870.5          2
35: 2015-12-23 07:18:30   NA        NA 871.0          2
36: 2015-12-23 07:18:40   NA        NA 872.0          2
37: 2015-12-23 07:18:50   NA        NA 872.0          2
38: 2015-12-23 07:19:00 1267       970 874.0          2
39: 2015-12-23 07:19:10   NA        NA 875.0          2
40: 2015-12-23 07:19:20   NA        NA 875.0          2
41: 2015-12-23 07:19:30   NA        NA 876.0          2
42: 2015-12-23 07:19:40   NA        NA 876.5          2
43: 2015-12-23 07:19:50   NA        NA 876.0          2
44: 2015-12-23 07:20:00   NA        NA 878.0          2
45: 2015-12-23 07:20:10   NA        NA 876.0          2
46: 2015-12-23 07:20:20   NA        NA 875.5          2
47: 2015-12-23 07:20:30   NA        NA 875.5          2
48: 2015-12-23 07:20:40   NA        NA 874.0          2
49: 2015-12-23 07:20:50   NA        NA 872.5          2
50: 2015-12-23 07:21:00   NA        NA 870.5          2
           Bottom.Start Dive Max.Depth Depth DiveNumber

【讨论】:

本,非常感谢!这非常有效:D 您介意用外行的术语解释最后两行代码的含义吗?尤其是 ':=' 和 '.' 的作用是什么?为什么最后一行有“x.Bottom.Start”?很想了解以备将来使用:) :=data.table 中的引用赋值,没有复制数据。有关详细信息,请参阅this。在上述情况下,我只是添加了 3 个新列(一个用于 DiveNumber 计数,另外 2 个用于开始和结束时间)...至于 x. 添加到 Bottom.Start,这可以让您参考到连接中第一个data.table的特定列,否则它们被另一个屏蔽...data.table语法中的句点.只是list... 关于非等连接中使用的x.前缀的更多细节可以在here找到。【参考方案2】:

您可以尝试使用difftime lagcumsum -

library(dplyr)

df %>%
  mutate(DiveNumber = cumsum(c(TRUE, difftime(Bottom.Start, 
                             lag(Bottom.Start), units = 'mins')[-1] > 2)))

#           Bottom.Start Dive Max.Depth Depth DiveNumber
# 1: 2015-12-22 01:07:00   NA        NA 311.0          1
# 2: 2015-12-22 01:07:10   NA        NA 308.5          1
# 3: 2015-12-22 01:07:20   NA        NA 307.0          1
# 4: 2015-12-22 01:07:30   NA        NA 306.5          1
# 5: 2015-12-22 01:07:40   NA        NA 305.5          1
# 6: 2015-12-22 01:07:50   NA        NA 308.5          1
# 7: 2015-12-22 01:08:00   NA        NA 318.5          1
# 8: 2015-12-22 01:08:10   NA        NA 331.0          1
#...
#...
#37: 2015-12-22 01:13:00   NA        NA 637.5          1
#38: 2015-12-22 01:13:10   NA        NA 655.0          1
#39: 2015-12-23 07:17:00   NA        NA 863.0          2
#40: 2015-12-23 07:17:10   NA        NA 863.5          2
#41: 2015-12-23 07:17:20   NA        NA 865.0          2
#42: 2015-12-23 07:17:30   NA        NA 866.0          2
#43: 2015-12-23 07:17:40   NA        NA 867.0          2
#...
#...
#74: 2015-12-23 07:22:50   NA        NA 907.0          2
#75: 2015-12-23 07:23:00   NA        NA 922.5          2

对于 12 小时的实际目标,您可以将 units = 'mins' 更改为 units = 'hours' 并将 &gt; 2 更改为 &gt; 12

【讨论】:

以上是关于如何从两个合并的数据帧中选择完成之前和之后的特定时间间隔?的主要内容,如果未能解决你的问题,请参考以下文章

SparkSQL在分组后从数据帧中获取之前和之后的行

spark:合并两个数据帧,如果两个数据帧中的ID重复,则df1中的行覆盖df2中的行

如何在 Scala 中连接两个数据帧并通过索引从数据帧中选择几列?

匹配两个特定子字符串之一之前或之后的数字子字符串

合并两个不同长度的python pandas数据帧,但将所有行保留在输出数据帧中

在分支合并之前恢复到特定状态