熊猫数据透视表到数据框[重复]
Posted
技术标签:
【中文标题】熊猫数据透视表到数据框[重复]【英文标题】:pandas pivot table to data frame [duplicate] 【发布时间】:2017-07-31 04:43:12 【问题描述】:我有一个如下所示的数据框 (df):
+---------+-------+------------+----------+
| subject | pills | date | strength |
+---------+-------+------------+----------+
| 1 | 4 | 10/10/2012 | 250 |
| 1 | 4 | 10/11/2012 | 250 |
| 1 | 2 | 10/12/2012 | 500 |
| 2 | 1 | 1/6/2014 | 1000 |
| 2 | 1 | 1/7/2014 | 250 |
| 2 | 1 | 1/7/2014 | 500 |
| 2 | 3 | 1/8/2014 | 250 |
+---------+-------+------------+----------+
当我在 R 中使用 reshape 时,我得到了我想要的:
reshape(df, idvar = c("subject","date"), timevar = 'strength', direction = "wide")
+---------+------------+--------------+--------------+---------------+
| subject | date | strength.250 | strength.500 | strength.1000 |
+---------+------------+--------------+--------------+---------------+
| 1 | 10/10/2012 | 4 | NA | NA |
| 1 | 10/11/2012 | 4 | NA | NA |
| 1 | 10/12/2012 | NA | 2 | NA |
| 2 | 1/6/2014 | NA | NA | 1 |
| 2 | 1/7/2014 | 1 | 1 | NA |
| 2 | 1/8/2014 | 3 | NA | NA |
+---------+------------+--------------+--------------+---------------+
使用熊猫:
df.pivot_table(df, index=['subject','date'],columns='strength')
+---------+------------+-------+----+-----+
| | | pills |
+---------+------------+-------+----+-----+
| | strength | 250 | 500| 1000|
+---------+------------+-------+----+-----+
| subject | date | | | |
+---------+------------+-------+----+-----+
| 1 | 10/10/2012 | 4 | NA | NA |
| | 10/11/2012 | 4 | NA | NA |
| | 10/12/2012 | NA | 2 | NA |
+---------+------------+-------+----+-----+
| 2 | 1/6/2014 | NA | NA | 1 |
| | 1/7/2014 | 1 | 1 | NA |
| | 1/8/2014 | 3 | NA | NA |
+---------+------------+-------+----+-----+
如何使用 pandas 获得与 R 中完全相同的输出?我只想要 1 个标题。
【问题讨论】:
您的旋转 df.reset_index() 将为您提供预期的输出 不完全...它给了我 2 个标题 您的原始数据框没有标题“患者”。它来自哪里? 抱歉错字,请参阅上面的编辑 数字 25、50 和 250 是从哪里来的?请给我们看一个一致的例子。 【参考方案1】:旋转后,将数据框转换为记录,然后再转换回数据框:
flattened = pd.DataFrame(pivoted.to_records())
# subject date ('pills', 250) ('pills', 500) ('pills', 1000)
#0 1 10/10/2012 4.0 NaN NaN
#1 1 10/11/2012 4.0 NaN NaN
#2 1 10/12/2012 NaN 2.0 NaN
#3 2 1/6/2014 NaN NaN 1.0
#4 2 1/7/2014 1.0 1.0 NaN
#5 2 1/8/2014 3.0 NaN NaN
如果需要,您现在可以“修复”列名:
flattened.columns = [hdr.replace("('pills', ", "strength.").replace(")", "") \
for hdr in flattened.columns]
flattened
# subject date strength.250 strength.500 strength.1000
#0 1 10/10/2012 4.0 NaN NaN
#1 1 10/11/2012 4.0 NaN NaN
#2 1 10/12/2012 NaN 2.0 NaN
#3 2 1/6/2014 NaN NaN 1.0
#4 2 1/7/2014 1.0 1.0 NaN
#5 2 1/8/2014 3.0 NaN NaN
这很尴尬,但它有效。
【讨论】:
完美,谢谢!以上是关于熊猫数据透视表到数据框[重复]的主要内容,如果未能解决你的问题,请参考以下文章