Python Pandas:将数据框的列与列名合并为一列
Posted
技术标签:
【中文标题】Python Pandas:将数据框的列与列名合并为一列【英文标题】:Python Pandas: Merge Columns of Data Frame with column name into one column 【发布时间】:2019-01-08 01:31:27 【问题描述】:我的数据框中有以下格式的数据:
>>> df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
>>> df
A B C D
0 0.578095 -1.985742 -0.269517 -0.180319
1 -0.618431 -0.937284 0.556290 -1.416877
2 1.695109 0.122219 0.182450 0.411448
3 0.228466 0.268943 -1.249488 3.227840
4 0.005990 -0.805618 -1.941092 -0.146649
5 -1.116451 -0.649854 1.272314 1.422760
我想通过附加行数据和列名来组合每行的一些列,创建以下输出:
A B New Column
0 0.578095 -1.985742 "C":"-0.269517","D":"-0.180319"
1 -0.618431 -0.937284 "C":"0.556290","D":"-1.416877"
2 1.695109 0.122219 "C":"0.182450","D":"0.411448"
3 0.228466 0.268943 "C":"-1.249488","D":"3.227840"
4 0.005990 -0.805618 "C":"-1.941092","D":"-0.146649"
5 -1.116451 -0.649854 "C":"1.272314","D":"1.422760"
如何在 pandas 中实现这一点?
最终的游戏是将数据采用 JSON 格式,其中 C-D 列作为维度 A-B 的度量,然后将它们存储到 Snowflake 中的表中。
【问题讨论】:
你已经做了什么? 【参考方案1】:将to_dict
与'records'
一起使用
df['New c']=df[['C','D']].to_dict('records')
df
Out[580]:
A B C D \
0 0.578095 -1.985742 -0.269517 -0.180319
1 -0.618431 -0.937284 0.556290 -1.416877
2 1.695109 0.122219 0.182450 0.411448
3 0.228466 0.268943 -1.249488 3.227840
4 0.005990 -0.805618 -1.941092 -0.146649
5 -1.116451 -0.649854 1.272314 1.422760
New c
0 'C': -0.269517, 'D': -0.180319
1 'C': 0.55629, 'D': -1.416877
2 'C': 0.18245, 'D': 0.411448
3 'C': -1.249488, 'D': 3.22784
4 'C': -1.9410919999999998, 'D': -0.146649
5 'C': 1.272314, 'D': 1.42276
【讨论】:
这个也可以。我只需要用新的列替换其他列,因为数据框太大,我想尽我所能地完成它,内存和性能方面,因此我选择了另一个答案。但这一个也有效。谢谢!【参考方案2】:与前面的答案类似,但直接给你 JSON:
df["New Column"] = df[["C", "D"]].agg(lambda x: x.to_json(), axis=1)
df.drop(columns=["C", "D"], inplace=True)
A B New Column
0 0.203402 0.963421 "C":0.0006991508,"D":0.6259404479
1 0.259584 0.992885 "C":0.4362059517,"D":0.198117864
2 0.470500 0.242945 "C":0.6507973014,"D":0.8585516803
3 0.337716 0.937279 "C":0.7682917478,"D":0.4398740192
4 0.449790 0.863678 "C":0.9256099517,"D":0.4139063442
5 0.837881 0.310204 "C":0.2481016705,"D":0.8652550757
如果您有 NaN 值,则可以扩展 lambda:
df["New Column"] = df[["C", "D"]].agg(lambda x: x.dropna().to_json(), axis=1)
df.drop(columns=["C", "D"], inplace=True)
A B C D
0 0.247098 0.318231 0.188487 0.604020
1 0.696833 0.554107 0.982078 0.047739
2 0.874721 0.557809 NaN 0.474376
3 0.185668 0.477824 0.900544 NaN
4 0.085932 0.808342 0.360703 0.331273
5 0.665791 0.011564 0.785515 0.177014
A B New Column
0 0.247098 0.318231 "C":0.1884867142,"D":0.6040197923
1 0.696833 0.554107 "C":0.9820776439,"D":0.0477394369
2 0.874721 0.557809 "D":0.4743764396
3 0.185668 0.477824 "C":0.9005440032
4 0.085932 0.808342 "C":0.3607030306,"D":0.3312725694
5 0.665791 0.011564 "C":0.7855148493,"D":0.1770143921
【讨论】:
【参考方案3】:删除列并使用agg
创建一个新列:
df2 = df.drop(['C', 'D'], axis=1).assign(New_Column=
df[['C', 'D']].agg(pd.Series.to_dict, axis=1))
df2
A B New_Column
0 -0.645719 -0.757112 'D': 0.8923148471642509, 'C': -0.685995130541...
1 -0.124200 -0.578526 'D': -0.5457121278891495, 'C': -1.46006615752...
2 2.160417 -0.985475 'D': -0.49915307027471345, 'C': 0.85388172610...
3 2.111050 1.384887 'D': -0.4617380879640236, 'C': 0.907519279458...
4 0.781630 -0.366445 'D': -0.3105127375402184, 'C': 0.295808587414...
5 0.460773 0.549545 'D': -0.993162129461116, 'C': 0.8163378188816...
【讨论】:
以上是关于Python Pandas:将数据框的列与列名合并为一列的主要内容,如果未能解决你的问题,请参考以下文章
Python Pandas - 具有不同列的 Concat 数据框忽略列名