具有多索引的 Pandas 子数据透视表和总数据透视表

Posted 2023-03-12

技术标签:

【中文标题】具有多索引的 Pandas 子数据透视表和总数据透视表【英文标题】：Pandas sub- and total of pivot tables with multiindex 【发布时间】：2020-08-30 03:48:09 【问题描述】：

我正在尝试添加一个带有小计的新列和一个带有总计的最后一列。例如，

df = pd.DataFrame("A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
               "B": ["one", "one", "one", "two", "two","one", "one", "two", "two"],
               "C": ["small", "large", "large", "small","small", "large", "small", "small", "large"],
               "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
               "E": [2, 4, 5, 5, 6, 6, 8, 9, 9])

即：

     A    B      C  D  E
0  foo  one  small  1  2
1  foo  one  large  2  4
2  foo  one  large  2  5
3  foo  two  small  3  5
4  foo  two  small  3  6
5  bar  one  large  4  6
6  bar  one  small  5  8
7  bar  two  small  6  9
8  bar  two  large  7  9

现在我转：

table = pd.pivot_table(df, values=['D',"E"], index=['A'],columns=['C'])

并添加总数：

table['total'] = table.sum(axis=1)
for t in ["D", "E"]:
   table[t, "partial_total"]  = table[t].sum(axis=1)

虽然这在数字上有效，但在视觉上却很烦人。我想要D（包括partial_total）的所有数据，然后是E，然后是total。这是我的结果df：

        D               E                total             D             E
C   large     small large     small            partial_total partial_total
A                                                                         
bar   5.5  5.500000   7.5  8.500000  27.000000     11.000000     16.000000
foo   2.0  2.333333   4.5  4.333333  13.166667      4.333333      8.833333

所以

如何将相同（***）列的值组合在一起？

【问题讨论】：

【参考方案1】：

您可以使用margin 进行旋转：

new_df = (df.pivot_table(index='A', columns='C', 
                         values=['D','E'], aggfunc='sum',
                         margins=True, margins_name='partial_total')
   .assign(total=lambda x: x.loc[:, (slice(None),'partial_total')].sum(1))
)

输出：

                D                               E                               total
C               large   small   partial_total   large   small   partial_total   
A                           
bar             11      11      22              15      17      32              54
foo             4       7       11              9       13      22              33
partial_total   15      18      33              24      30      54              87

【讨论】：

【参考方案2】：

用pd.concat试试这个：

table = pd.pivot_table(df, values=['D',"E"], index=['A'],columns=['C'])
table.columns = [f'i_j' for i, j in table.columns]
pd.concat([table,
           table.sum(axis=1, level=0).add_suffix('_partial_total'),
           table.sum(axis=1).to_frame(name='total')], axis=1)

输出：

     D_large   D_small  E_large   E_small  D_large_partial_total  D_small_partial_total  E_large_partial_total  E_small_partial_total      total
A                                                                                                                                               
bar      5.5  5.500000      7.5  8.500000                    5.5               5.500000                    7.5               8.500000  27.000000
foo      2.0  2.333333      4.5  4.333333                    2.0               2.333333                    4.5               4.333333  13.166667

【讨论】：

【参考方案3】：

尝试在pivot_table之前执行操作

g = df.groupby(['A', 'C'])[['D', 'E']]

d = (g.sum()/g.count()).reset_index()
m = d.groupby('A', as_index=False).sum().assign(C='partial')

final = pd.concat([m, d]).pivot_table(index='A', columns='C')

        D                          E                     
C   large     small    partial large     small    partial
A                                                        
bar   5.5  5.500000  11.000000   7.5  8.500000  16.000000
foo   2.0  2.333333   4.333333   4.5  4.333333   8.833333

具体回答您的最后一个问题

如何将相同（***）列的值组合在一起？

你可以sort_index

table.sort_index(axis=1)

        D                             E                              total
C   large partial_total     small large partial_total     small           
A                                                                         
bar   5.5     11.000000  5.500000   7.5     16.000000  8.500000  27.000000
foo   2.0      4.333333  2.333333   4.5      8.833333  4.333333  13.166667

【讨论】：

以上是关于具有多索引的 Pandas 子数据透视表和总数据透视表的主要内容，如果未能解决你的问题，请参考以下文章