Pandas:如何在两个不同的重叠时间序列上合并两个数据帧
Posted
技术标签:
【中文标题】Pandas:如何在两个不同的重叠时间序列上合并两个数据帧【英文标题】:Pandas: How to merge two dataframes on two different overlapping timeseries 【发布时间】:2020-04-20 12:31:51 【问题描述】:我创建了两个数据框,均涵盖 2012 年全年,日期时间为重叠列。 df1 行/样本以毫秒为单位,df2 每 15 分钟有一行。 它们显然重叠,但我如何让它们根据时间将 df2 行插入 DF1 的位置合并。 我尝试过合并外部,这应该是正确的选择,但我也尝试过“内部”、“左”甚至“右”。
它将添加来自 df2 的列,但似乎不会添加来自 df2 的行 我只添加了我的数据集的一个样本,因为 df1 有超过 1 亿个样本。 这是 10.000 的 csv:df1df2 非常感谢您的帮助:-)
import pandas as pd
import datetime
def mergeDF(lowTF,highTF):
tf_merge = pd.merge(lowTF, highTF, on='Time', how='outer')
fill_merge = tf_merge.fillna(method='ffill')
return fill_merge
df1:
Time,Year,Month,Day,Hour
2012-01-09 00:00:00.653,2012,1,9,0
2012-01-09 00:00:01.388,2012,1,9,0
2012-01-09 00:00:01.739,2012,1,9,0
2012-01-09 00:00:02.265,2012,1,9,0
2012-01-09 00:00:03.349,2012,1,9,0
2012-01-09 00:00:03.489,2012,1,9,0
2012-01-09 00:00:04.311,2012,1,9,0
2012-01-09 00:00:04.719,2012,1,9,0
2012-01-09 00:00:05.384,2012,1,9,0
2012-01-09 00:00:05.800,2012,1,9,0
df2:
Time,DayOfWeak,ext_Volume,15_Absorption Volume,15_Bag Holding
2012-01-09 00:00:00,1,679,0,0
2012-01-09 00:15:00,1,988,0,0
2012-01-09 00:30:00,1,718,0,0
2012-01-09 00:45:00,1,583,0,0
2012-01-09 01:00:00,1,885,0,0
2012-01-09 01:15:00,1,589,0,0
2012-01-09 01:30:00,1,611,0,0
2012-01-09 01:45:00,1,620,0,0
2012-01-09 02:00:00,1,657,0,0
2012-01-09 02:15:00,1,691,0,0
-
merged = mergeDF(df1,df2)
merged
Time,Year,Month,Day,Hour,DayOfWeak,ext_Volume,15_Absorption Volume,15_Bag Holding
2012-01-09 00:00:00.653,2012,1,9,0,,,,
2012-01-09 00:00:01.388,2012,1,9,0,,,,
2012-01-09 00:00:01.739,2012,1,9,0,,,,
2012-01-09 00:00:02.265,2012,1,9,0,,,,
2012-01-09 00:00:03.349,2012,1,9,0,,,,
2012-01-09 00:00:03.489,2012,1,9,0,,,,
2012-01-09 00:00:04.311,2012,1,9,0,,,,
2012-01-09 00:00:04.719,2012,1,9,0,,,,
2012-01-09 00:00:05.384,2012,1,9,0,,,,
2012-01-09 00:00:05.800,2012,1,9,0,,,,
【问题讨论】:
欢迎来到 SO!您能否提供一个可重现的示例?快速浏览how to make good pandas example 【参考方案1】:我认为,最直观的方式是:
pd.merge_asof(DF1, DF2, on='Time')
为了提供一个更有启发性的例子,我在最后更改了 分钟 DF1 中的两行到 15 并得到:
Time Year Month Day Volume DayOfWeek_x ext_Volume 15_Absorption Volume
0 2012-01-09 00:00:00.653 2012 1 9 3 1 679 0
1 2012-01-09 00:00:01.388 2012 1 9 2 1 679 0
2 2012-01-09 00:00:01.739 2012 1 9 2 1 679 0
3 2012-01-09 00:15:02.265 2012 1 9 2 1 988 0
4 2012-01-09 00:15:03.349 2012 1 9 2 1 988 0
如您所见,索引为 0、1 和 2 的行与 Time == 00:00:00,而最后 2 个 Time == 00:15:00 , ext_Volume 列上易于验证的内容。
【讨论】:
谢谢 Valdi_Bo。它工作得很好,甚至使 .fillna 代码变得多余,因为值被复制到其他行。以上是关于Pandas:如何在两个不同的重叠时间序列上合并两个数据帧的主要内容,如果未能解决你的问题,请参考以下文章