Python:如何按时间条件连接表
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python:如何按时间条件连接表相关的知识,希望对你有一定的参考价值。
嗨,我有一个原始数据叫做df1。我想通过以下conditions将df2加入df1:1)将df2的CaseNo
col匹配到df1。2)对于每个CaseNo
,df2的Request Date
列必须介于当前行的Moving_Start_Date和前进/后行之间。3)如果满足条件2的> 1 RequestDate
,我们选择最近的日期(每RequestDate
最多1个Movement_Sequence_No
)。
df1 = pd.DataFrame('CaseNo':[1,1,1,1,2,2,2,2],
'Movement_Sequence_No':[1,2,3,4,1,2,3,4],
'Movement_Start_Date':['2020-02-09 22:17:00','2020-02-10 17:19:41','2020-02-17 08:04:19',
'2020-02-18 11:22:52','2020-02-12 23:00:00','2020-02-24 10:26:35',
'2020-03-03 17:50:00','2020-03-17 08:24:19'],
'Movement_End_Date':['2020-02-10 17:19:41','2020-02-17 08:04:19','2020-02-18 11:22:52',
'2020-02-25 13:55:37','2020-02-24 10:26:35','2020-03-03 17:50:00',
'9999-12-31 23:59:59','2020-03-18 18:50:00'],
'Category':['A','A','A','A','B','B','B','B'])
df2 = pd.DataFrame('CaseNo':[1,1,1,1,1,1,2,2,2,2,2],
'RequestDate':['2020-02-16 13:04:20','2020-02-17 09:10:10','2020-02-18 07:11:11',
'2020-02-20 14:03:55','2020-02-21 21:30:30','2020-02-27 12:52:10',
'2020-02-13 22:00:00','2020-03-15 09:40:00','2020-03-17 09:45:20',
'2020-03-18 09:26:19','2020-03-18 15:10:10'],
'Platelets':['189','207','190','195','188','241','328','266','180','210','310'])
答案
这是merge_asof
的工作。首先将列转换为日期时间,然后将merge_asof
sort_values
转换为query
和drop_duplicates
,并保留满足您条件的行。最后,从merge
返回到df1
以获取与merge_asof
不匹配的行。
df1['Movement_Start_Date'] = pd.to_datetime(df1['Movement_Start_Date'])
df1['Movement_End_Date'] = pd.to_datetime(df1['Movement_End_Date'], errors='coerce')\
.fillna(pd.Timestamp.now()).dt.floor('s')
df2['RequestDate'] = pd.to_datetime(df2['RequestDate'])
df_f = (pd.merge_asof(df2.sort_values('RequestDate'),
df1.sort_values('Movement_Start_Date'),
by=['CaseNo'],
left_on=['RequestDate'], right_on=['Movement_Start_Date'],
direction='backward')
.sort_values(['CaseNo', 'Movement_Sequence_No'])
.query('RequestDate <= Movement_End_Date')
.drop_duplicates(['CaseNo', 'Movement_Sequence_No'], keep='last')
)
df_f = df1.merge(df_f, how='outer')
你得到
print (df_f)
CaseNo Movement_Sequence_No Movement_Start_Date Movement_End_Date \
0 1 1 2020-02-09 22:17:00 2020-02-10 17:19:41
1 1 2 2020-02-10 17:19:41 2020-02-17 08:04:19
2 1 3 2020-02-17 08:04:19 2020-02-18 11:22:52
3 1 4 2020-02-18 11:22:52 2020-02-25 13:55:37
4 2 1 2020-02-12 23:00:00 2020-02-24 10:26:35
5 2 2 2020-02-24 10:26:35 2020-03-03 17:50:00
6 2 3 2020-03-03 17:50:00 2020-05-21 12:44:11
7 2 4 2020-03-17 08:24:19 2020-03-18 18:50:00
Category RequestDate Platelets
0 A NaT NaN
1 A 2020-02-16 13:04:20 189
2 A 2020-02-18 07:11:11 190
3 A 2020-02-21 21:30:30 188
4 B 2020-02-13 22:00:00 328
5 B NaT NaN
6 B 2020-03-15 09:40:00 266
7 B 2020-03-18 15:10:10 310
以上是关于Python:如何按时间条件连接表的主要内容,如果未能解决你的问题,请参考以下文章