按日期不等的日期合并数据框
Posted
技术标签:
【中文标题】按日期不等的日期合并数据框【英文标题】:Merge Data Frames By Date With Unequal Dates 【发布时间】:2018-10-01 08:37:51 【问题描述】:我的流程是这样的:
-
导入包含日期、激活和取消的数据的 csv
通过激活或取消对数据进行子集化
使用 aggfunc 'sum' 透视数据
转换回数据帧
现在,我需要将 2 个数据框合并在一起,但是一个数据框中存在日期,而另一个数据框中不存在日期。两个数据框都从 2017 年 1 月 1 日开始,到 2017 年 12 月 31 日结束。优选地,需要填充索引月份的任何观察的输出具有对应的值 0。
这是来自两个数据帧的 .head():
作为参考,这里是到目前为止的代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import datetime
%matplotlib inline
#import data
directory1 = "C:\python\Contracts"
directory_source = os.path.join(directory1, "Contract_Data.csv")
df_source = pd.read_csv(directory_source)
#format date ranges as times
#df_source["Activation_Month"] = pd.to_datetime(df_source["Activation_Month"])
#df_source["Cancellation_Month"] = pd.to_datetime(df_source["Cancellation_Month"])
df_source["Activation_Day"] = pd.to_datetime(df_source["Activation_Day"])
df_source["Cancellation_Day"] = pd.to_datetime(df_source["Cancellation_Day"])
#subset the data based on status
df_active = df_source[df_source["Order Status"]=="Active"]
df_active = pd.DataFrame(df_active[["Activation_Day", "Event_Value"]].copy())
df_cancelled = df_source[df_source["Order Status"]=="Cancelled"]
df_cancelled = pd.DataFrame(df_cancelled[["Cancellation_Day", "Event_Value"]].copy())
#remove activations outside 2017 and cancellations outside 2017
df_cancelled = df_cancelled[(df_cancelled['Cancellation_Day'] > '2016-12-31') &
(df_cancelled['Cancellation_Day'] <= '2017-12-31')]
df_active = df_active[(df_active['Activation_Day'] > '2016-12-31') &
(df_active['Activation_Day'] <= '2017-12-31')]
#pivot the data to aggregate by day
df_active_aggregated = df_active.pivot_table(index='Activation_Day',
values='Event_Value',
aggfunc='sum')
df_cancelled_aggregated = df_cancelled.pivot_table(index='Cancellation_Day',
values='Event_Value',
aggfunc='sum')
#convert pivot tables back to useable dataframes
activations_aggregated = pd.DataFrame(df_active_aggregated.to_records())
cancellations_aggregated = pd.DataFrame(df_cancelled_aggregated.to_records())
#rename the time columns so they can be referenced when merging into one DF
activations_aggregated.columns = ["index_month", "Activations"]
#activations_aggregated = activations_aggregated.set_index(pd.DatetimeIndex(activations_aggregated["index_month"]))
cancellations_aggregated.columns = ["index_month", "Cancellations"]
#cancellations_aggregated = cancellations_aggregated.set_index(pd.DatetimeIndex(cancellations_aggregated["index_month"]))
我知道有很多帖子解决了与此类似的问题,但我找不到任何有帮助的东西。感谢任何可以帮助我的人!
【问题讨论】:
不要使用图片分享样本数据;根据提供的快照,您制作示例应该不难reproducible。 问题陈述是什么?如何合并两个dfs?如何处理日期?如何填充空单元格? 【参考方案1】:你可以试试:
activations_aggregated.merge(cancellations_aggregated, how='outer', on='index_month').fillna(0)
【讨论】:
如果回答符合您的要求,请考虑接受。以上是关于按日期不等的日期合并数据框的主要内容,如果未能解决你的问题,请参考以下文章