多索引中的 Pandas 自定义排序行

Posted 2023-03-12

技术标签:

【中文标题】多索引中的 Pandas 自定义排序行【英文标题】：Pandas Custom Sort Row in Multiindex 【发布时间】：2017-06-16 09:50:49 【问题描述】：

鉴于以下情况：

import pandas as pd
arrays = [['bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'baz'],
          ['total', 'two', 'one', 'two', 'four', 'total', 'five']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(7), index=index)
s

first  second
bar    total     0.334158
       two      -0.267854
       one       1.161727
baz    two      -0.748685
       four     -0.888634
       total     0.383310
       five      0.506120
dtype: float64

如何确保“总”行（每个第二个索引）总是像这样位于每个组的底部？：

first  second
bar    one       0.210911
       two       0.628357
       total    -0.911331
baz    two       0.315396
       four     -0.195451
       five      0.060159
       total     0.638313
dtype: float64

【问题讨论】：

最简单的选择是调用它"~total" 或"|total|" 或"total"。然后它总是会被排序到底部。请不要在问题中提及截止日期：请记住，几乎所有回答的人都是志愿者。很抱歉。 【参考方案1】：

解决方案 1

我对此不满意。我正在研究不同的解决方案

unstacked = s.unstack(0)
total = unstacked.loc['total']
unstacked.drop('total').append(total).unstack().dropna()

first  second
bar    one       1.682996
       two       0.343783
       total     1.287503
baz    five      0.360170
       four      1.113498
       two       0.083691
       total    -0.377132
dtype: float64

解决方案 2

我觉得这个更好

second = pd.Categorical(
    s.index.levels[1].values,
    categories=['one', 'two', 'three', 'four', 'five', 'total'],
    ordered=True
)
s.index.set_levels(second, level='second', inplace=True)

cols = s.index.names
s.reset_index().sort_values(cols).set_index(cols)

                     0
first second          
bar   one     1.682996
      two     0.343783
      total   1.287503
baz   two     0.083691
      four    1.113498
      five    0.360170
      total  -0.377132

【讨论】：

【参考方案2】：

unstack 用于创建具有第二级 MultiIndex 列的 DataFrame，然后将 total 的列重新排序到最后一列并最后使用排序的 CategoricalIndex。

所以如果stack 级别total 是最后一个。

np.random.seed(123)
arrays = [['bar', 'bar', 'bar', 'baz', 'baz', 'baz', 'baz'],
          ['total', 'two', 'one', 'two', 'four', 'total', 'five']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
s = pd.Series(np.random.randn(7), index=index)
print (s)
first  second
bar    total    -1.085631
       two       0.997345
       one       0.282978
baz    two      -1.506295
       four     -0.578600
       total     1.651437
       five     -2.426679
dtype: float64

df = s.unstack()
df = df[df.columns[df.columns != 'total'].tolist() + ['total']]
df.columns = pd.CategoricalIndex(df.columns, ordered=True)
print (df)
second      five    four       one       two     total
first                                                 
bar          NaN     NaN  0.282978  0.997345 -1.085631
baz    -2.426679 -0.5786       NaN -1.506295  1.651437

s1 = df.stack()
print (s1)
first  second
bar    one       0.282978
       two       0.997345
       total    -1.085631
baz    five     -2.426679
       four     -0.578600
       two      -1.506295
       total     1.651437
dtype: float64

print (s1.sort_index())
first  second
bar    one       0.282978
       two       0.997345
       total    -1.085631
baz    five     -2.426679
       four     -0.578600
       two      -1.506295
       total     1.651437
dtype: float64

【讨论】：

以上是关于多索引中的 Pandas 自定义排序行的主要内容，如果未能解决你的问题，请参考以下文章