Python：在相同索引上加入具有相同列前缀的两个 DataFrame

Posted 2023-02-24

技术标签:

【中文标题】Python：在相同索引上加入具有相同列前缀的两个 DataFrame【英文标题】：Python: Join two DataFrames with same column prefix on same indices 【发布时间】：2022-01-23 04:44:50 【问题描述】：

我有两个如下所示的数据框：

df1 = pd.DataFrame(

    "A_price": [10, 12],
    "B_price": [20, 21],
,
index = ['01-01-2020', '01-02-2021']
)
df1:
            A_price B_price
01-01-2020  10      20
01-02-2021  12      21

df2 = pd.DataFrame(

    "A_weight": [0.1, 0.12],
    "B_weight": [0.2, 0.21],
,
index = ['01-01-2020', '01-02-2021']
)
df2:
            A_weight B_weight
01-01-2020  0.1      0.2
01-02-2021  0.12     0.21

如何将两个数据框连接到相同的索引上，然后将列置于层次结构中？即我想要以下内容：

df:
            A              B
            price weight   price weight      
01-01-2020  10    0.1      20    0.2
01-02-2021  12    0.12     21    0.21

【问题讨论】：

【参考方案1】：

只需将pd.concat 与axis=1 水平连接，并用_ 和.columns.str.split 拆分列（与expand=True 一起返回MultiIndex）：

new_df = pd.concat([df1, df2], axis=1)
new_df.columns = new_df.columns.str.split('_', expand=True)

输出：

>>> new_df
               A     B      A      B
           price price weight weight
01-01-2020    10    20   0.10   0.20
01-02-2021    12    21   0.12   0.21

【讨论】：

【参考方案2】：

这应该可行。

pd.concat((df1.T,df2.T), keys=["A", "B"]).T

【讨论】：

【参考方案3】：

您可以将pd.concat 与keys 参数结合使用sort_index()，以使它们具有正确的结构。然后rename多索引内层去掉前缀的列：

df = pd.concat([df1, df2], keys=['A','B'],axis=1).sort_index(level=1, axis=1)
df.rename(columns=lambda x: x.split('_')[1], level=1)

               A      B     A      B
           price weight price weight
01-01-2020    10   0.10    20   0.20
01-02-2021    12   0.12    21   0.21

【讨论】：

【参考方案4】：

使用join（或merge）并展开列名。

# out = pd.merge(df1, df2, left_index=True, right_index=True)
out = out.join(df2)
out.columns = out.columns.str.split('_', expand=True)
out = out.sort_index(axis=1)
print(out)

# Output:
               A            B       
           price weight price weight
01-01-2020    10   0.10    20   0.20
01-02-2021    12   0.12    21   0.21

【讨论】：

以上是关于Python：在相同索引上加入具有相同列前缀的两个 DataFrame的主要内容，如果未能解决你的问题，请参考以下文章

将具有相同列/索引的两个 pandas DataFrame 合并为一个 DataFrame

消息：未定义的索引：获取列名时（两个具有相同列的表）Codeigniter

创建 2 个具有相同键列但不同包含列的非聚集索引

如何在具有相同元素的两个 Python 列表上执行 MYSQL INNER JOIN？

在python中的公共列上加入两个数据框

有效地指定具有相同前缀的多个列名称