熊猫：选择行 - 基于列表 - 带有重复行标签的 DF

Posted 2023-03-12

技术标签:

【中文标题】熊猫：选择行 - 基于列表 - 带有重复行标签的 DF【英文标题】：pandas: select rows - based on list - DF with duplicate rows labels 【发布时间】：2016-02-20 03:37:50 【问题描述】：

类似但不一样：Selecting rows - based on a list - from a DF with duplicated columns

我有两个 dfs：

df1 = pd.DataFrame('total': [25, 45, 75, 36, 45], 
                   index=['base', 'c', 'd', 'base', 'e'])
      total
base     25
c        45
d        75
base     36
e        45

df2 = pd.DataFrame('type': ['rc', 'rc', 'c%', 'c%', 'pp%'], 
                    index=['base', 'c', 'd', 'base', 'e'])

     type
base   rc
c      rc
d      c%
base   c%
e      pp%

我想从 df1 获取 df2 中值为 'c%' 和/或 'pp%' 的行。

这就是我的做法

keep = df2[df2['type'].isin(['c%', 'pp%'])].index
Index([u'd', u'base', u'e'], dtype='object')

df1.loc[keep]
      total
d        75
base     25
base     36
e        45

'base 25' 不应该在那里，但因为我使用标签，所以我理解它为什么在那里。

期望的结果：

      total
d        75
base     36
e        45

如何更改我的代码来处理这个问题？

【问题讨论】：

【参考方案1】：

In [9]:

(df2['type'] == 'c%') | (df2['type'] == 'pp%')
Out[9]:
base    False
c       False
d        True
base     True
e        True
Name: type, dtype: bool

In [8]:
df1[(df2['type'] == 'c%') | (df2['type'] == 'pp%')]
Out[8]:
     total
d      75
base   36
e      45

【讨论】：

重置索引怎么样？因为过滤可能不是基于布尔值。您要基于true 或false 值还是基于indices 进行切片？索引。道歉真假只是一个例子，更容易解释。你提到了I want to get the rows which are 'True' from df2 from df1，其实我听不懂你的问题我已经更新了我的问题，希望它更清楚一点。使用真假不是一个好主意。【参考方案2】：

这是你想要的吗？

In [54]: df1[['total']][df2['bool']=='True']
Out[54]: 
      total
d        75
base     36
e        45

【讨论】：

以上是关于熊猫：选择行 - 基于列表 - 带有重复行标签的 DF的主要内容，如果未能解决你的问题，请参考以下文章