具有多索引的 Pandas 子数据透视表和总数据透视表
Posted
技术标签:
【中文标题】具有多索引的 Pandas 子数据透视表和总数据透视表【英文标题】:Pandas sub- and total of pivot tables with multiindex 【发布时间】:2020-08-30 03:48:09 【问题描述】:我正在尝试添加一个带有小计的新列和一个带有总计的最后一列。例如,
df = pd.DataFrame("A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
"B": ["one", "one", "one", "two", "two","one", "one", "two", "two"],
"C": ["small", "large", "large", "small","small", "large", "small", "small", "large"],
"D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
"E": [2, 4, 5, 5, 6, 6, 8, 9, 9])
即:
A B C D E
0 foo one small 1 2
1 foo one large 2 4
2 foo one large 2 5
3 foo two small 3 5
4 foo two small 3 6
5 bar one large 4 6
6 bar one small 5 8
7 bar two small 6 9
8 bar two large 7 9
现在我转:
table = pd.pivot_table(df, values=['D',"E"], index=['A'],columns=['C'])
并添加总数:
table['total'] = table.sum(axis=1)
for t in ["D", "E"]:
table[t, "partial_total"] = table[t].sum(axis=1)
虽然这在数字上有效,但在视觉上却很烦人。我想要D
(包括partial_total
)的所有数据,然后是E
,然后是total
。这是我的结果df:
D E total D E
C large small large small partial_total partial_total
A
bar 5.5 5.500000 7.5 8.500000 27.000000 11.000000 16.000000
foo 2.0 2.333333 4.5 4.333333 13.166667 4.333333 8.833333
所以
如何将相同(***)列的值组合在一起?
【问题讨论】:
【参考方案1】:您可以使用margin
进行旋转:
new_df = (df.pivot_table(index='A', columns='C',
values=['D','E'], aggfunc='sum',
margins=True, margins_name='partial_total')
.assign(total=lambda x: x.loc[:, (slice(None),'partial_total')].sum(1))
)
输出:
D E total
C large small partial_total large small partial_total
A
bar 11 11 22 15 17 32 54
foo 4 7 11 9 13 22 33
partial_total 15 18 33 24 30 54 87
【讨论】:
【参考方案2】:用pd.concat
试试这个:
table = pd.pivot_table(df, values=['D',"E"], index=['A'],columns=['C'])
table.columns = [f'i_j' for i, j in table.columns]
pd.concat([table,
table.sum(axis=1, level=0).add_suffix('_partial_total'),
table.sum(axis=1).to_frame(name='total')], axis=1)
输出:
D_large D_small E_large E_small D_large_partial_total D_small_partial_total E_large_partial_total E_small_partial_total total
A
bar 5.5 5.500000 7.5 8.500000 5.5 5.500000 7.5 8.500000 27.000000
foo 2.0 2.333333 4.5 4.333333 2.0 2.333333 4.5 4.333333 13.166667
【讨论】:
【参考方案3】:尝试在pivot_table
之前执行操作
g = df.groupby(['A', 'C'])[['D', 'E']]
d = (g.sum()/g.count()).reset_index()
m = d.groupby('A', as_index=False).sum().assign(C='partial')
final = pd.concat([m, d]).pivot_table(index='A', columns='C')
D E
C large small partial large small partial
A
bar 5.5 5.500000 11.000000 7.5 8.500000 16.000000
foo 2.0 2.333333 4.333333 4.5 4.333333 8.833333
具体回答您的最后一个问题
如何将相同(***)列的值组合在一起?
你可以sort_index
table.sort_index(axis=1)
D E total
C large partial_total small large partial_total small
A
bar 5.5 11.000000 5.500000 7.5 16.000000 8.500000 27.000000
foo 2.0 4.333333 2.333333 4.5 8.833333 4.333333 13.166667
【讨论】:
以上是关于具有多索引的 Pandas 子数据透视表和总数据透视表的主要内容,如果未能解决你的问题,请参考以下文章
Pandas Pivot Table - 重新组织多索引的顺序
Pandas:具有多索引的 fillna() 方法 - NaN 填充了错误的列