Python pandas - >按列名中的条件选择

Posted 2023-03-12

技术标签:

【中文标题】Python pandas - >按列名中的条件选择【英文标题】：Python pandas -> select by condition in columns name 【发布时间】：2017-08-30 07:27:44 【问题描述】：

我有 df 的列名：'a'、'b'、'c' ... 'z'。

print(my_df.columns)
Index(['a', 'b', 'c', ... 'y', 'z'],
  dtype='object', name=0)

我有确定应该显示哪些列的函数。例如：

start = con_start()
stop = con_stop()
print(my_df.columns >= start) & (my_df <= stop)

我的结果是：

[False False ... False False False False  True  True
True  True False False]

我的目标是仅显示满足我条件的列的数据框。如果 start = 'a' 和 stop = 'b'，我想要：

0                                      a              b         
index1       index2                                                  
New York     New York           0.000000       0.000000          
California   Los Angeles   207066.666667  214466.666667     
Illinois     Chicago       138400.000000  143633.333333     
Pennsylvania Philadelphia   53000.000000   53633.333333      
Arizona      Phoenix       111833.333333  114366.666667

【问题讨论】：

【参考方案1】：

您可以通过 .loc 使用切片来实现此目的：

 df.loc[:,'a':'b']

【讨论】：

【参考方案2】：

我想让它变得健壮并且尽可能少的假设。

选项 1 将 iloc 与数组切片一起使用假设：

my_df.columns.is_unique 计算结果为 True 列已按顺序排列

start = df.columns.get_loc(con_start())
stop = df.columns.get_loc(con_stop())

df.iloc[:, start:stop + 1]

选项 2 将 loc 与布尔切片一起使用假设：

列值具有可比性

start = con_start()
stop = con_stop()

c = df.columns.values
m = (start <= c) & (stop >= c)

df.loc[:, m]

【讨论】：

【参考方案3】：

生成要显示的列列表：

cols = [x for x in my_df.columns if start <= x <= stop]

在您的 DataFrame 中仅使用这些列：

my_df[cols]

【讨论】：

【参考方案4】：

假设result 是您的[true/false] 数组，而letters 是[a...z]：

res=[letters[i] for i,r in enumerate(result) if r]
new_df=df[res]

【讨论】：

【参考方案5】：

如果您的条件与您在示例中显示的复杂程度相似，则无需使用任何其他功能，只需进行过滤，例如

sweet_and_red_fruit = fruit[(fruit[sweet == 1) & (fruit["colour"] == "red")]
print(sweet_and_red_fruit)

或者如果你只想打印

print(fruit[(fruit[sweet == 1) & (fruit["colour"] == "red")])

【讨论】：

以上是关于Python pandas - >按列名中的条件选择的主要内容，如果未能解决你的问题，请参考以下文章