在 Pandas 中结合日期数据框和值数据框

Posted

技术标签:

【中文标题】在 Pandas 中结合日期数据框和值数据框【英文标题】:Combining a dataframe of dates and a dataframe of values in Pandas 【发布时间】:2019-11-10 00:53:53 【问题描述】:

我有两个数据框,一个是 Dates,一个是 Values,需要合并。

df_Values = pd.DataFrame('Resource':['Mechanical','Electrical','Pipelines','Process','Project Management'], 
                '0':[0.005, 0.005, 0.040, 0.075, 0.005], 
                '1':[0.005, 0.005, 0.040, 0.075, 0.005],
                '2':[0.005, 0.005, 0.040, 0.075, 0.005],
                '3':[0.005, 0.005, 0.040, 0.075, 0.005],                   
                '4':[0.005, 0.040, 2000, 2000, 2000],
                '5':[0.005, 0.005, float("nan") , 50, float("nan") ],
                '6':[float("nan"), 0.005, float("nan"), 50, float("nan")],
                '7':[float("nan"), 0.040, float("nan"), 50, float("nan")],
                '8':[float("nan"), 0.005, float("nan"), 50, float("nan")],
                '9':[float("nan"), 0.040, float("nan"), float("nan"), float("nan")],
                '10':[float("nan"), 0.040, float("nan"), float("nan"), float("nan")])


df_Dates = pd.DataFrame('Resource':['Mechanical','Electrical','Pipelines','Process','Project Management'], 
                '0':['2019-01-03', '2019-01-05', '2019-01-08', '2019-03-04', '2019-05-11'], 
                '1':['2019-01-04', '2019-01-06', '2019-01-09', '2019-03-05', '2019-05-12'],
                '2':['2019-01-05', '2019-01-07', '2019-01-10', '2019-03-06', '2019-05-13'],
                '3':['2019-01-06', '2019-01-08', '2019-01-11', '2019-03-07', '2019-05-14'],                   
                '4':['2019-01-07', '2019-01-09', '2019-01-12', '2019-03-08', '2019-05-15'],
                '5':['2019-01-08', '2019-01-10', float("nan"), '2019-03-09', float("nan")],
                '6':[float("nan"), '2019-01-11', float("nan"), '2019-03-10', float("nan")],
                '7':[float("nan"), '2019-01-12', float("nan"), '2019-03-11', float("nan")],
                '8':[float("nan"), '2019-01-13', float("nan"), '2019-03-12', float("nan")],
                '9':[float("nan"), '2019-01-14', float("nan"), float("nan"), float("nan")],
                '10':[float("nan"), '2019-01-15', float("nan"), float("nan"), float("nan")])

我正在尝试将它们组合起来,以便列标题是日期,并且相应的值会合并到数据行中。

像这样:

df_Result = pd.DataFrame('Resource':['Mechanical','Electrical','Pipelines','Process','Project Management'], 
                '2019-01-03':[0.005, float("nan"), float("nan"), float("nan"), 0.005], 
                '2019-01-04':[0.005, float("nan"), float("nan"), 0.075, 0.005],
                '2019-01-05':[0.040, float("nan"), float("nan"), 0.075, 0.005],
                '2019-01-06':[0.075, float("nan"), float("nan"), 0.075, 0.005],                   
                '2019-01-07':[0.005, float("nan"), float("nan"), 2000, 2000],
                '2019-01-08':[float("nan"), float("nan"), 0.040, 50, float("nan")],
                '2019-01-09':[float("nan"), float("nan"), 0.040, 50, float("nan")],
                '2019-01-10':[float("nan"), 0.005, 0.040, 50, float("nan")],
                '2019-01-11':[float("nan"), 0.005, 0.040, 50, float("nan")],
                '2019-01-12':[float("nan"), 0.005, 2000, float("nan"), float("nan")],
                '2019-01-13':[float("nan"), 0.005, float("nan"), float("nan"), float("nan")])

关于如何实现这一点的任何想法?

最终目标是在日期内分配这些值。

谢谢,

【问题讨论】:

为什么你的 NaN 是字符串?这使它不方便加倍。 我会改一下,让它更容易一些。 【参考方案1】:

考虑使用melt 将两个数据帧重新整形为长格式,然后是两者的merge,然后使用pivot_table 重新整形为宽格式:

mdf = pd.merge(df_Values.melt(id_vars = 'Resource', var_name = 'Num', value_name = 'Val'),
               df_Dates.melt(id_vars = 'Resource', var_name = 'Num', value_name = 'Date'),
               on=['Resource', 'Num'])

pvt_df = mdf.pivot_table(index='Resource', columns='Date', values='Val')

输出

pvt_df 

# Date                2019-01-03  2019-01-04  2019-01-05  2019-01-06  2019-01-07  2019-01-08  2019-01-09  2019-01-10  2019-01-11  \
# Resource                                                                                                                         
# Electrical                 NaN         NaN       0.005       0.005       0.005       0.005        0.04       0.005       0.005   
# Mechanical               0.005       0.005       0.005       0.005       0.005       0.005         NaN         NaN         NaN   
# Pipelines                  NaN         NaN         NaN         NaN         NaN       0.040        0.04       0.040       0.040   
# Process                    NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN   
# Project Management         NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN   
# 
# Date                2019-01-12  2019-01-13  2019-01-14  2019-01-15  2019-03-04  2019-03-05  2019-03-06  2019-03-07  2019-03-08  \
# Resource                                                                                                                         
# Electrical                0.04       0.005        0.04        0.04         NaN         NaN         NaN         NaN         NaN   
# Mechanical                 NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN   
# Pipelines              2000.00         NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN   
# Process                    NaN         NaN         NaN         NaN       0.075       0.075       0.075       0.075      2000.0   
# Project Management         NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN   
# 
# Date                2019-03-09  2019-03-10  2019-03-11  2019-03-12  2019-05-11  2019-05-12  2019-05-13  2019-05-14  2019-05-15  
# Resource                                                                                                                        
# Electrical                 NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN  
# Mechanical                 NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN  
# Pipelines                  NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN         NaN  
# Process                   50.0        50.0        50.0        50.0         NaN         NaN         NaN         NaN         NaN  
# Project Management         NaN         NaN         NaN         NaN       0.005       0.005       0.005       0.005      2000.0  

【讨论】:

以上是关于在 Pandas 中结合日期数据框和值数据框的主要内容,如果未能解决你的问题,请参考以下文章

Pandas 结合 2 个数据框并覆盖值

Pandas:比较大数据框和小数据框

使用 Pandas 将数据框和其他数据保存在同一个 .csv 文件中

在 Pandas 数据框的多索引数据中按索引和值排序

加入/合并两个 Pandas 数据框并将列用作多索引

Pandas:将 DataFrame 与嵌套数组结合或合并 JSON 输出