将值插入在已知列 pandas 中命名的列中
Posted
技术标签:
【中文标题】将值插入在已知列 pandas 中命名的列中【英文标题】:Insert value into column which is named in known column pandas 【发布时间】:2017-04-26 09:14:37 【问题描述】:我正在为机器学习准备数据,其中数据位于 pandas DataFrame 中,如下所示:
Column v1 v2
first 1 2
second 3 4
third 5 6
现在我想把它变成:
Column v1 v2 first-v1 first-v2 second-v1 econd-v2 third-v1 third-v2
first 1 2 1 2 Nan Nan Nan Nan
second 3 4 Nan Nan 3 4 Nan Nan
third 5 6 Nan Nan Nan Nan 5 6
我尝试过做这样的事情:
# we know how many values there are but
# length can be changed into length of [1, 2, 3, ...] values
values = ['v1', 'v2']
# data with description from above is saved in data
for value in values:
data[ str(data['Column'] + '-' + value)] = data[ value]
结果是具有名称的列:
['first-v1' 'second-v1'..], ['first-v2' 'second-v2'..]
有正确值的地方。我做错了什么?因为我的数据很大,有没有更优化的方法来做到这一点?
感谢您的宝贵时间!
【问题讨论】:
【参考方案1】:您可以使用unstack
在列中交换和排序MultiIndex
:
df = data.set_index('Column', append=True)[values].unstack()
.swaplevel(0,1, axis=1).sort_index(1)
df.columns = df.columns.map('-'.join)
print (df)
first-v1 first-v2 second-v1 second-v2 third-v1 third-v2
0 1.0 2.0 NaN NaN NaN NaN
1 NaN NaN 3.0 4.0 NaN NaN
2 NaN NaN NaN NaN 5.0 6.0
或者stack
+ unstack
:
df = data.set_index('Column', append=True).stack().unstack([1,2])
df.columns = df.columns.map('-'.join)
print (df)
first-v1 first-v2 second-v1 second-v2 third-v1 third-v2
0 1.0 2.0 NaN NaN NaN NaN
1 NaN NaN 3.0 4.0 NaN NaN
2 NaN NaN NaN NaN 5.0 6.0
最后join
到原来的:
df = data.join(df)
print (df)
Column v1 v2 first-v1 first-v2 second-v1 second-v2 third-v1 \
0 first 1 2 1.0 2.0 NaN NaN NaN
1 second 3 4 NaN NaN 3.0 4.0 NaN
2 third 5 6 NaN NaN NaN NaN 5.0
third-v2
0 NaN
1 NaN
2 6.0
【讨论】:
哇,谢谢你的回答,我不会自己解决这个问题,再次感谢!以上是关于将值插入在已知列 pandas 中命名的列中的主要内容,如果未能解决你的问题,请参考以下文章
如何通过 switch compact 将值 1 插入 sq-lite 中的列
pandas在dataframe数据列中插入全是全是固定数值或者固定文本内容的数据列(add a column to pandas dataframe with constant values)
pandas使用assign函数在dataframe数据列中插入全是全是缺失值(NaN)的数据列(add an empty column in dataframe)