在 Pandas 中结合日期数据框和值数据框
Posted
技术标签:
【中文标题】在 Pandas 中结合日期数据框和值数据框【英文标题】:Combining a dataframe of dates and a dataframe of values in Pandas 【发布时间】:2019-11-10 00:53:53 【问题描述】:我有两个数据框,一个是 Dates,一个是 Values,需要合并。
df_Values = pd.DataFrame('Resource':['Mechanical','Electrical','Pipelines','Process','Project Management'],
'0':[0.005, 0.005, 0.040, 0.075, 0.005],
'1':[0.005, 0.005, 0.040, 0.075, 0.005],
'2':[0.005, 0.005, 0.040, 0.075, 0.005],
'3':[0.005, 0.005, 0.040, 0.075, 0.005],
'4':[0.005, 0.040, 2000, 2000, 2000],
'5':[0.005, 0.005, float("nan") , 50, float("nan") ],
'6':[float("nan"), 0.005, float("nan"), 50, float("nan")],
'7':[float("nan"), 0.040, float("nan"), 50, float("nan")],
'8':[float("nan"), 0.005, float("nan"), 50, float("nan")],
'9':[float("nan"), 0.040, float("nan"), float("nan"), float("nan")],
'10':[float("nan"), 0.040, float("nan"), float("nan"), float("nan")])
df_Dates = pd.DataFrame('Resource':['Mechanical','Electrical','Pipelines','Process','Project Management'],
'0':['2019-01-03', '2019-01-05', '2019-01-08', '2019-03-04', '2019-05-11'],
'1':['2019-01-04', '2019-01-06', '2019-01-09', '2019-03-05', '2019-05-12'],
'2':['2019-01-05', '2019-01-07', '2019-01-10', '2019-03-06', '2019-05-13'],
'3':['2019-01-06', '2019-01-08', '2019-01-11', '2019-03-07', '2019-05-14'],
'4':['2019-01-07', '2019-01-09', '2019-01-12', '2019-03-08', '2019-05-15'],
'5':['2019-01-08', '2019-01-10', float("nan"), '2019-03-09', float("nan")],
'6':[float("nan"), '2019-01-11', float("nan"), '2019-03-10', float("nan")],
'7':[float("nan"), '2019-01-12', float("nan"), '2019-03-11', float("nan")],
'8':[float("nan"), '2019-01-13', float("nan"), '2019-03-12', float("nan")],
'9':[float("nan"), '2019-01-14', float("nan"), float("nan"), float("nan")],
'10':[float("nan"), '2019-01-15', float("nan"), float("nan"), float("nan")])
我正在尝试将它们组合起来,以便列标题是日期,并且相应的值会合并到数据行中。
像这样:
df_Result = pd.DataFrame('Resource':['Mechanical','Electrical','Pipelines','Process','Project Management'],
'2019-01-03':[0.005, float("nan"), float("nan"), float("nan"), 0.005],
'2019-01-04':[0.005, float("nan"), float("nan"), 0.075, 0.005],
'2019-01-05':[0.040, float("nan"), float("nan"), 0.075, 0.005],
'2019-01-06':[0.075, float("nan"), float("nan"), 0.075, 0.005],
'2019-01-07':[0.005, float("nan"), float("nan"), 2000, 2000],
'2019-01-08':[float("nan"), float("nan"), 0.040, 50, float("nan")],
'2019-01-09':[float("nan"), float("nan"), 0.040, 50, float("nan")],
'2019-01-10':[float("nan"), 0.005, 0.040, 50, float("nan")],
'2019-01-11':[float("nan"), 0.005, 0.040, 50, float("nan")],
'2019-01-12':[float("nan"), 0.005, 2000, float("nan"), float("nan")],
'2019-01-13':[float("nan"), 0.005, float("nan"), float("nan"), float("nan")])
关于如何实现这一点的任何想法?
最终目标是在日期内分配这些值。
谢谢,
【问题讨论】:
为什么你的 NaN 是字符串?这使它不方便加倍。 我会改一下,让它更容易一些。 【参考方案1】:考虑使用melt
将两个数据帧重新整形为长格式,然后是两者的merge
,然后使用pivot_table
重新整形为宽格式:
mdf = pd.merge(df_Values.melt(id_vars = 'Resource', var_name = 'Num', value_name = 'Val'),
df_Dates.melt(id_vars = 'Resource', var_name = 'Num', value_name = 'Date'),
on=['Resource', 'Num'])
pvt_df = mdf.pivot_table(index='Resource', columns='Date', values='Val')
输出
pvt_df
# Date 2019-01-03 2019-01-04 2019-01-05 2019-01-06 2019-01-07 2019-01-08 2019-01-09 2019-01-10 2019-01-11 \
# Resource
# Electrical NaN NaN 0.005 0.005 0.005 0.005 0.04 0.005 0.005
# Mechanical 0.005 0.005 0.005 0.005 0.005 0.005 NaN NaN NaN
# Pipelines NaN NaN NaN NaN NaN 0.040 0.04 0.040 0.040
# Process NaN NaN NaN NaN NaN NaN NaN NaN NaN
# Project Management NaN NaN NaN NaN NaN NaN NaN NaN NaN
#
# Date 2019-01-12 2019-01-13 2019-01-14 2019-01-15 2019-03-04 2019-03-05 2019-03-06 2019-03-07 2019-03-08 \
# Resource
# Electrical 0.04 0.005 0.04 0.04 NaN NaN NaN NaN NaN
# Mechanical NaN NaN NaN NaN NaN NaN NaN NaN NaN
# Pipelines 2000.00 NaN NaN NaN NaN NaN NaN NaN NaN
# Process NaN NaN NaN NaN 0.075 0.075 0.075 0.075 2000.0
# Project Management NaN NaN NaN NaN NaN NaN NaN NaN NaN
#
# Date 2019-03-09 2019-03-10 2019-03-11 2019-03-12 2019-05-11 2019-05-12 2019-05-13 2019-05-14 2019-05-15
# Resource
# Electrical NaN NaN NaN NaN NaN NaN NaN NaN NaN
# Mechanical NaN NaN NaN NaN NaN NaN NaN NaN NaN
# Pipelines NaN NaN NaN NaN NaN NaN NaN NaN NaN
# Process 50.0 50.0 50.0 50.0 NaN NaN NaN NaN NaN
# Project Management NaN NaN NaN NaN 0.005 0.005 0.005 0.005 2000.0
【讨论】:
以上是关于在 Pandas 中结合日期数据框和值数据框的主要内容,如果未能解决你的问题,请参考以下文章