使用多索引列连接两个 pandas 数据框

Posted

技术标签:

【中文标题】使用多索引列连接两个 pandas 数据框【英文标题】:Joining two pandas dataframes with multi-indexed columns 【发布时间】:2020-12-04 07:15:33 【问题描述】:

我想加入两个 pandas 数据帧,其中一个具有多索引列。

这就是我制作第一个数据帧的方式。

data_large = pd.DataFrame("name":["a", "b", "c"], "sell":[10, 60, 50], "buy":[20, 30, 40])
data_mini = pd.DataFrame("name":["b", "c", "d"], "sell":[60, 20, 10], "buy":[30, 50, 40])
data_topix = pd.DataFrame("name":["a", "b", "c"], "sell":[10, 80, 0], "buy":[70, 30, 40])

df_out = pd.concat([dfi.set_index('name') for dfi in [data_large, data_mini, data_topix]], 
                   keys=['Large', 'Mini', 'Topix'], axis=1)\
           .rename_axis(mapper=['name'], axis=0).rename_axis(mapper=['product','buy_sell'], axis=1)
df_out

这是第二个数据帧。

group = pd.DataFrame("name":["a", "b", "c", "d"], "group":[1, 1, 2, 2])
group

如何在 name 列上将第二个连接到第一个,同时保留多索引列?

这不起作用,它使多索引变平。

df_final = df_out.merge(group, on=['name'], how='left')

任何帮助将不胜感激!

【问题讨论】:

【参考方案1】:

如果在merge 之后需要MultiIndex,则必须将列group 转换为MultiIndex DataFrame,这里将列name 转换为索引以按索引合并,否则两列都必须转换为MultiIndex

group = group.set_index('name')
group.columns = pd.MultiIndex.from_product([group.columns, ['new']])

df_final = df_out.merge(group, on=['name'], how='left')

或者:

df_final = df_out.merge(group, left_index=True, right_index=True, how='left')

print (df_final)
product  Large        Mini       Topix       group
buy_sell  sell   buy  sell   buy  sell   buy   new
name                                              
a         10.0  20.0   NaN   NaN  10.0  70.0     1
b         60.0  30.0  60.0  30.0  80.0  30.0     1
c         50.0  40.0  20.0  50.0   0.0  40.0     2
d          NaN   NaN  10.0  40.0   NaN   NaN     2

另一种可能的方法是在merge 之后将值转换为MultiIndex

df_final = df_out.merge(group, on=['name'], how='left')

UserWarning:不同级别之间的合并可能会产生意想不到的结果(左侧 2 个级别,右侧 1 个级别) warnings.warn(msg, UserWarning)


L = [x if isinstance(x, tuple) else (x, 'new') for x in df_final.columns.tolist()]
df_final.columns = pd.MultiIndex.from_tuples(L)   
print (df_final)
  name Large        Mini       Topix       group
   new  sell   buy  sell   buy  sell   buy   new
0    a  10.0  20.0   NaN   NaN  10.0  70.0     1
1    b  60.0  30.0  60.0  30.0  80.0  30.0     1
2    c  50.0  40.0  20.0  50.0   0.0  40.0     2
3    d   NaN   NaN  10.0  40.0   NaN   NaN     2

编辑:如果需要group in MultiIndex:

group = group.set_index(['name'])
group.columns = pd.MultiIndex.from_product([group.columns, ['new']])

df_final = (df_out.merge(group, on=['name'], how='left')
                  .set_index([('group','new')], append=True)
                  .rename_axis(['name','group']))
print (df_final)
product    Large        Mini       Topix      
buy_sell    sell   buy  sell   buy  sell   buy
name group                                    
a    1      10.0  20.0   NaN   NaN  10.0  70.0
b    1      60.0  30.0  60.0  30.0  80.0  30.0
c    2      50.0  40.0  20.0  50.0   0.0  40.0
d    2       NaN   NaN  10.0  40.0   NaN   NaN

或者:

df_final = df_out.merge(group, on=['name'], how='left').set_index(['name','group'])
df_final.columns = pd.MultiIndex.from_tuples(df_final.columns)
print (df_final)
           Large        Mini       Topix      
            sell   buy  sell   buy  sell   buy
name group                                    
a    1      10.0  20.0   NaN   NaN  10.0  70.0
b    1      60.0  30.0  60.0  30.0  80.0  30.0
c    2      50.0  40.0  20.0  50.0   0.0  40.0
d    2       NaN   NaN  10.0  40.0   NaN   NaN

【讨论】:

感谢您的回答!是否可以在行索引中包含group?所以在加入之后,我想在行索引中都有namegroup。您的解决方案提供了两个级别(groupnew),我相信这是因为左侧数据框有两个级别。但我想避免这种new 所以我想最好在行索引中包含group ... @MakotoMiyazaki - 没那么容易,添加了解决方案。

以上是关于使用多索引列连接两个 pandas 数据框的主要内容,如果未能解决你的问题,请参考以下文章

将 Pandas 数据帧与多索引列和不规则时间戳连接起来

使熊猫具有多索引列的多个数据框并完全连接

Pandas 连接具有相同行索引的多索引列

加入数据框 - 一个具有多索引列,另一个没有

在多索引列上合并pandas数据帧

在 pandas 数据框中使用多索引连接数据框