使用基于索引的条件切片 MultiIndex DataFrame [重复]

Posted 2023-02-23

技术标签:

【中文标题】使用基于索引的条件切片 MultiIndex DataFrame [重复]【英文标题】：Slicing a MultiIndex DataFrame with a condition based on the index [duplicate] 【发布时间】：2018-11-09 13:10:28 【问题描述】：

我有一个如下所示的数据框：

import pandas as pd
import numpy as np

arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux']), np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])]
df = pd.DataFrame([[24, 13,  8,  9],
   [11, 30,  7, 23],
   [21, 31, 12, 30],
   [ 2,  5, 19, 24],
   [15, 18,  3, 16],
   [ 2, 24, 28, 11],
   [23,  9,  6, 12],
   [29, 28, 11, 21]], index=arrays, columns=list('abcd'))


df
          a   b   c   d
bar one  24  13   8   9
    two  11  30   7  23
baz one  21  31  12  30
    two   2   5  19  24
foo one  15  18   3  16
    two   2  24  28  11
qux one  23   9   6  12
    two  29  28  11  21

我想对数据帧进行切片，以使结果包含所有以foo 作为其第一级索引值的行以及所有以bar 作为一级索引和two 作为二级索引的行。 IE。生成的数据框应如下所示：

          a   b   c   d
bar two  11  30   7  23
foo one  15  18   3  16
    two   2  24  28  11

获得此结果的一种方法是

pd.concat([df.loc[[('bar', 'two')],:], df.loc[('foo', slice(None)),:]])

但是这种方式感觉很繁琐，肯定有更“pythonic”的方式..

【问题讨论】：

为什么不只是 reset_index :-) @Wen 我只是认为必须有一种方法可以通过使用 .loc / .xs 方法来获得结果，但我无法弄清楚。 【参考方案1】：

query 救援：

df.query('ilevel_0 == "foo" or (ilevel_0 == "bar" and ilevel_1 == "two")')

          a   b   c   d
bar two  11  30   7  23
foo one  15  18   3  16
    two   2  24  28  11

xs、loc 等都失败了，因为您跨级别的切片不一致。

【讨论】：

使用 pandas 一年，从不使用 `.query' 的头。有没有办法命名索引 s.t.这些名称可以用来代替“ilevel_0”、“ilevel_1”等吗？ @crs 当然，使用df = df.rename_axis(['name1', 'name2'])，你可以用你的名字替换 ilevel_*。【参考方案2】：

你可以使用默认切片

l0 = df.index.get_level_values(0)
l1 = df.index.get_level_values(1)
cond = (l0 == "foo") | ((l0=="bar") & (l1=="two"))
df[cond]

输出

        a   b   c   d
bar two 11  30  7   23
foo one 15  18  3   16
    two 2   24  28  11

【讨论】：

以上是关于使用基于索引的条件切片 MultiIndex DataFrame [重复]的主要内容，如果未能解决你的问题，请参考以下文章