使用多索引列连接两个 pandas 数据框
Posted
技术标签:
【中文标题】使用多索引列连接两个 pandas 数据框【英文标题】:Joining two pandas dataframes with multi-indexed columns 【发布时间】:2020-12-04 07:15:33 【问题描述】:我想加入两个 pandas 数据帧,其中一个具有多索引列。
这就是我制作第一个数据帧的方式。
data_large = pd.DataFrame("name":["a", "b", "c"], "sell":[10, 60, 50], "buy":[20, 30, 40])
data_mini = pd.DataFrame("name":["b", "c", "d"], "sell":[60, 20, 10], "buy":[30, 50, 40])
data_topix = pd.DataFrame("name":["a", "b", "c"], "sell":[10, 80, 0], "buy":[70, 30, 40])
df_out = pd.concat([dfi.set_index('name') for dfi in [data_large, data_mini, data_topix]],
keys=['Large', 'Mini', 'Topix'], axis=1)\
.rename_axis(mapper=['name'], axis=0).rename_axis(mapper=['product','buy_sell'], axis=1)
df_out
这是第二个数据帧。
group = pd.DataFrame("name":["a", "b", "c", "d"], "group":[1, 1, 2, 2])
group
如何在 name
列上将第二个连接到第一个,同时保留多索引列?
这不起作用,它使多索引变平。
df_final = df_out.merge(group, on=['name'], how='left')
任何帮助将不胜感激!
【问题讨论】:
【参考方案1】:如果在merge
之后需要MultiIndex
,则必须将列group
转换为MultiIndex DataFrame
,这里将列name
转换为索引以按索引合并,否则两列都必须转换为MultiIndex
:
group = group.set_index('name')
group.columns = pd.MultiIndex.from_product([group.columns, ['new']])
df_final = df_out.merge(group, on=['name'], how='left')
或者:
df_final = df_out.merge(group, left_index=True, right_index=True, how='left')
print (df_final)
product Large Mini Topix group
buy_sell sell buy sell buy sell buy new
name
a 10.0 20.0 NaN NaN 10.0 70.0 1
b 60.0 30.0 60.0 30.0 80.0 30.0 1
c 50.0 40.0 20.0 50.0 0.0 40.0 2
d NaN NaN 10.0 40.0 NaN NaN 2
另一种可能的方法是在merge
之后将值转换为MultiIndex
:
df_final = df_out.merge(group, on=['name'], how='left')
UserWarning:不同级别之间的合并可能会产生意想不到的结果(左侧 2 个级别,右侧 1 个级别) warnings.warn(msg, UserWarning)
L = [x if isinstance(x, tuple) else (x, 'new') for x in df_final.columns.tolist()]
df_final.columns = pd.MultiIndex.from_tuples(L)
print (df_final)
name Large Mini Topix group
new sell buy sell buy sell buy new
0 a 10.0 20.0 NaN NaN 10.0 70.0 1
1 b 60.0 30.0 60.0 30.0 80.0 30.0 1
2 c 50.0 40.0 20.0 50.0 0.0 40.0 2
3 d NaN NaN 10.0 40.0 NaN NaN 2
编辑:如果需要group
in MultiIndex
:
group = group.set_index(['name'])
group.columns = pd.MultiIndex.from_product([group.columns, ['new']])
df_final = (df_out.merge(group, on=['name'], how='left')
.set_index([('group','new')], append=True)
.rename_axis(['name','group']))
print (df_final)
product Large Mini Topix
buy_sell sell buy sell buy sell buy
name group
a 1 10.0 20.0 NaN NaN 10.0 70.0
b 1 60.0 30.0 60.0 30.0 80.0 30.0
c 2 50.0 40.0 20.0 50.0 0.0 40.0
d 2 NaN NaN 10.0 40.0 NaN NaN
或者:
df_final = df_out.merge(group, on=['name'], how='left').set_index(['name','group'])
df_final.columns = pd.MultiIndex.from_tuples(df_final.columns)
print (df_final)
Large Mini Topix
sell buy sell buy sell buy
name group
a 1 10.0 20.0 NaN NaN 10.0 70.0
b 1 60.0 30.0 60.0 30.0 80.0 30.0
c 2 50.0 40.0 20.0 50.0 0.0 40.0
d 2 NaN NaN 10.0 40.0 NaN NaN
【讨论】:
感谢您的回答!是否可以在行索引中包含group
?所以在加入之后,我想在行索引中都有name
和group
。您的解决方案提供了两个级别(group
和 new
),我相信这是因为左侧数据框有两个级别。但我想避免这种new
所以我想最好在行索引中包含group
...
@MakotoMiyazaki - 没那么容易,添加了解决方案。以上是关于使用多索引列连接两个 pandas 数据框的主要内容,如果未能解决你的问题,请参考以下文章