在 Pandas 中合并多个数据列

Posted 2023-03-11

技术标签:

【中文标题】在 Pandas 中合并多个数据列【英文标题】：Combine Multiple Data Columns in Pandas 【发布时间】：2019-11-19 08:12:16 【问题描述】：

我有以下熊猫数据框 -

df = 
    1.0         2.0         3.0             4.0         5.0
(1083, 596)                             (1050, 164)   (1050, 164)   
(1081, 595)                             (1050, 164)   (1080, 162)
(1081, 594)                             (1049, 163)   (1070, 164)
(1082, 593) 
            (1050, 164)     
            (1050, 164)     
            (1049, 163)     
            (1049, 163)     

                        (1052, 463)
                        (1051, 468)
                        (1054, 465)
                        (1057, 463)

我需要一个全新的数据框df2，包含 3 列：1.0、2.0（结合 2.0 和 4.0）和 3.0（结合 3.0 和 5.0）。

结果将是 -

df2 = 
    1.0         2.0         3.0     
(1083, 596) (1050, 164)   (1050, 164)   
(1081, 595) (1050, 164)   (1080, 162)
(1081, 594) (1049, 163)   (1070, 164)
(1082, 593) 
            (1050, 164)     
            (1050, 164)     
            (1049, 163)     
            (1049, 163)     

                        (1052, 463)
                        (1051, 468)
                        (1054, 465)
                        (1057, 463)

您可以预期合并列中不会有重叠的值；如果一行中的一列具有有效值，则其他列将具有 NaN 值。

我试过了——

df.fillna(0)
df2['2.0']=df['2.0']+df['4.0']

它没有按预期工作。有什么简单有效的方法吗？

【问题讨论】：

【参考方案1】：

基本上只是复制和粘贴。我认为这行得通。

# copy values over to your other columns
# note: [0:3,'2.0'] gets the first 4 rows (index 0 to 3) of column '2.0'
# then you set it equal to the first 4 rows of column '4.0'

df.loc[0:3,'2.0'] = df.loc[0:3,'4.0'] 
df.loc[0:3,'3.0'] = df.loc[0:3,'5.0'] 


# just get the three columns you need


df2 = df[['1.0','2.0','3.0']]


           1.0          2.0          3.0
0   (1083, 596)  (1050, 164)  (1050, 164)
1   (1081, 595)  (1050, 164)  (1080, 162)
2   (1081, 594)  (1049, 163)  (1070, 164)
3   (1082, 593)          NaN          NaN
4           NaN  (1050, 164)          NaN
5           NaN  (1050, 164)          NaN
6           NaN  (1049, 163)          NaN
7           NaN  (1049, 163)          NaN
8           NaN          NaN          NaN
9           NaN          NaN  (1052, 463)
10          NaN          NaN  (1051, 468)
11          NaN          NaN  (1054, 465)
12          NaN          NaN  (1057, 463)

如果您的列名实际上是浮点数，请从以下部分中删除引号：df.loc[0:3,'2.0'] 例如改成df.loc[0:3,2.0] 喜欢：

df.loc[0:3,2.0] = df.loc[0:3,4.0] 
df.loc[0:3,3.0] = df.loc[0:3,5.0]

【讨论】：

【参考方案2】：

您可以使用DataFrame.where() 和DataFrame.isnull() 以您尝试的方式混合值：

df2 = pd.DataFrame(df["1.0"], columns=["1.0"])
df2["2.0"] = df["2.0"].where(~df2["2.0"].isnull(), df2["4.0"])
df2["3.0"] = df["3.0"].where(~df2["3.0"].isnull(), df2["5.0"])

【讨论】：

【参考方案3】：

假设df 中的空格是NaNs。您只需要将列 '2.0, 3.0, 4.0, 5.0' 左移 2 个位置，然后将 combine_first 与 df 一起执行。最后，使用 iloc 选择前 3 列

df2 = df.combine_first(df.drop('1.0',1).shift(-2, axis=1)).iloc[:,:3]

Out[297]:
           1.0         2.0         3.0
0   (1083, 596)  (1050, 164)  (1050, 164)
1   (1081, 595)  (1050, 164)  (1080, 162)
2   (1081, 594)  (1049, 163)  (1070, 164)
3   (1082, 593)         NaN         NaN
4          NaN  (1050, 164)         NaN
5          NaN  (1050, 164)         NaN
6          NaN  (1049, 163)         NaN
7          NaN  (1049, 163)         NaN
8          NaN         NaN  (1052, 463)
9          NaN         NaN  (1051, 468)
10         NaN         NaN  (1054, 465)
11         NaN         NaN  (1057, 463)

【讨论】：

以上是关于在 Pandas 中合并多个数据列的主要内容，如果未能解决你的问题，请参考以下文章

如何使用 Pandas 将多个 csv 文件中的单个数据列合并为一个？

pandas将多个Series对象当成数据行进行垂直合并形成dataframepandas将多个Series对象当做数据列垂直合并形成dataframe

python--pandas合并与连接

pandas数据合并之append与concat

pandas 将多个数据框合并并更新为一列

基于Pandas.Dataframe中的多个列合并多个重复行