合并最近的追溯时间戳并在 pandas 中填充
Posted
技术标签:
【中文标题】合并最近的追溯时间戳并在 pandas 中填充【英文标题】:Merge on nearest retrospective timestamp and fill forward in pandas 【发布时间】:2021-06-13 07:57:03 【问题描述】:我很难掌握像 merge_asof()
这样的 pandas 特殊合并功能。
我有两个数据框:coords
- 来自 EV gps 的 ping,info
- 其他 EV 属性,例如导航目的地和电池电量。我的目标是合并它们,使输出数据帧的行数等于两个数据帧的行数之和。例如:
coords.shape
(10, 3)
coords
ts lat lng
2021-01-02 16:08:24.067971 58.3019 -134.4197
2021-01-06 12:54:18.535681 58.3021 -134.4195
2021-01-08 22:15:35.036423 58.3025 -134.4195
2021-01-16 01:10:39.610540 58.3029 -134.4193
2021-01-27 12:28:45.202376 58.3030 -134.4197
2021-01-30 05:32:09.404525 58.3031 -134.4190
2021-02-08 10:39:19.686159 58.3033 -134.4187
2021-02-15 01:30:16.733921 58.3039 -134.4187
2021-02-16 12:49:55.366025 58.3040 -134.4185
2021-02-19 23:57:57.369978 58.3041 -134.4181
info.shape
(3, 3)
info
ts nav_to battery
2021-01-26 12:47:52.972586 Juneau 90
2021-02-14 23:23:18.186058 Anchorage 50
2021-02-19 07:26:35.357977 Fairbanks 30
info
和 coord
应该合并,以便时间戳 ts
是连续顺序的,并且 info
行应该与 coords
中的行匹配,其中最近的时间戳出现在“之前” .最后,nav_to
、battery
、lat
和lng
应该向前填写。上述示例的输出将是:
output
ts lat lng nav_to battery
2021-01-02 16:08:24.067971 58.3019 -134.4197 None NaN
2021-01-06 12:54:18.535681 58.3021 -134.4195 None NaN
2021-01-08 22:15:35.036423 58.3025 -134.4195 None NaN
2021-01-16 01:10:39.610540 58.3029 -134.4193 None NaN
2021-01-26 12:47:52.972586 58.3029 -134.4193 Juneau 90.0
2021-01-27 12:28:45.202376 58.3030 -134.4197 Juneau 90.0
2021-01-30 05:32:09.404525 58.3031 -134.4190 Juneau 90.0
2021-02-08 10:39:19.686159 58.3033 -134.4187 Juneau 90.0
2021-02-14 23:23:18.186058 58.3033 -134.4187 Anchorage 50.0
2021-02-15 01:30:16.733921 58.3039 -134.4187 Anchorage 50.0
2021-02-16 12:49:55.366025 58.3040 -134.4185 Anchorage 50.0
2021-02-19 07:26:35.357977 58.3040 -134.4185 Fairbanks 30.0
2021-02-19 23:57:57.369978 58.3041 -134.4181 Fairbanks 30.0
我尝试过使用pd.merge_asof(coords, info, on="ts", direction="forward")
,但这不会产生正确的结果,它会向后填充并且只保留来自coords
的记录。在pandas
中产生所需结果的正确命令是什么?
【问题讨论】:
【参考方案1】:尝试使用默认的direction='backward'
,然后使用第二个数据帧concat
(pd.concat([pd.merge_asof(df1, df2, on='ts'), df2])
.sort_values('ts')
)
输出:
ts lat lng nav_to battery
0 2021-01-02 16:08:24.067971 58.3019 -134.4197 NaN NaN
1 2021-01-06 12:54:18.535681 58.3021 -134.4195 NaN NaN
2 2021-01-08 22:15:35.036423 58.3025 -134.4195 NaN NaN
3 2021-01-16 01:10:39.610540 58.3029 -134.4193 NaN NaN
0 2021-01-26 12:47:52.972586 NaN NaN Juneau 90.0
4 2021-01-27 12:28:45.202376 58.3030 -134.4197 Juneau 90.0
5 2021-01-30 05:32:09.404525 58.3031 -134.4190 Juneau 90.0
6 2021-02-08 10:39:19.686159 58.3033 -134.4187 Juneau 90.0
1 2021-02-14 23:23:18.186058 NaN NaN Anchorage 50.0
7 2021-02-15 01:30:16.733921 58.3039 -134.4187 Anchorage 50.0
8 2021-02-16 12:49:55.366025 58.3040 -134.4185 Anchorage 50.0
2 2021-02-19 07:26:35.357977 NaN NaN Fairbanks 30.0
9 2021-02-19 23:57:57.369978 58.3041 -134.4181 Fairbanks 30.0
然后您可以选择bfill
lat
和lng
列。或者你可以只merge_asof
两次:
(pd.concat([pd.merge_asof(df1, df2, on='ts'),
pd.merge_asof(df2, df1, on='ts')
])
.sort_values('ts')
)
输出:
ts lat lng nav_to battery
0 2021-01-02 16:08:24.067971 58.3019 -134.4197 NaN NaN
1 2021-01-06 12:54:18.535681 58.3021 -134.4195 NaN NaN
2 2021-01-08 22:15:35.036423 58.3025 -134.4195 NaN NaN
3 2021-01-16 01:10:39.610540 58.3029 -134.4193 NaN NaN
0 2021-01-26 12:47:52.972586 58.3029 -134.4193 Juneau 90.0
4 2021-01-27 12:28:45.202376 58.3030 -134.4197 Juneau 90.0
5 2021-01-30 05:32:09.404525 58.3031 -134.4190 Juneau 90.0
6 2021-02-08 10:39:19.686159 58.3033 -134.4187 Juneau 90.0
1 2021-02-14 23:23:18.186058 58.3033 -134.4187 Anchorage 50.0
7 2021-02-15 01:30:16.733921 58.3039 -134.4187 Anchorage 50.0
8 2021-02-16 12:49:55.366025 58.3040 -134.4185 Anchorage 50.0
2 2021-02-19 07:26:35.357977 58.3040 -134.4185 Fairbanks 30.0
9 2021-02-19 23:57:57.369978 58.3041 -134.4181 Fairbanks 30.0
【讨论】:
感谢编辑的答案 - lat/lng NaN 应该如何填写? @the_darkside se 编辑后的答案:-)。以上是关于合并最近的追溯时间戳并在 pandas 中填充的主要内容,如果未能解决你的问题,请参考以下文章
Pandas:合并两个 1D DataFrame,输出两列,并为唯一元素填充填充值