基于值的多列选择
Posted
技术标签:
【中文标题】基于值的多列选择【英文标题】:Multiple column selection based on values 【发布时间】:2019-04-10 19:19:48 【问题描述】:我有以下数据框:
df = pd.DataFrame( 'Group' : [1,1,1,2,2,2,2],
'Type' : ["High","Medium","Low","High","Medium","Low","Low"],
'set_0' :["a","a","a","a","a","a","a"],
'set_1' :["b","b","b","c","c","c","d"],
'set_2' :["e","e","e","NULL","NULL","f","f"],
'set_3' :["g","g","NULL","NULL","NULL","NULL","NULL"],
'set_4' :["NULL","NULL","NULL","NULL","NULL","NULL","NULL"],
'set_5' :["NULL","NULL","NULL","NULL","NULL","NULL","NULL"],
'set_6' :["h","h","NULL","NULL","NULL","NULL","NULL"]
)
我想删除一些“set_”列。如果“set_”相关列具有所有“NULL”值,我不希望代码保留它们。我只想保留至少包含一个非“NULL”值的 set_ 列。
如何在不进行硬编码的情况下处理它?
【问题讨论】:
【参考方案1】:首先选择object
dtype 系列并与您指定的字符串进行测试。然后使用带有布尔索引的pd.DataFrame.loc
或pd.DataFrame.drop
:
idx = df.select_dtypes(['object']).eq('NULL').all()
df = df.loc[:, ~df.columns.isin(idx[idx].index)]
# alternative:
# df = df.drop(idx[idx].index, 1)
print(df)
Group Type set_0 set_1 set_2 set_3 set_6
0 1 High a b e g h
1 1 Medium a b e g h
2 1 Low a b e NULL NULL
3 2 High a c NULL NULL NULL
4 2 Medium a c NULL NULL NULL
5 2 Low a c f NULL NULL
6 2 Low a d f NULL NULL
【讨论】:
以上是关于基于值的多列选择的主要内容,如果未能解决你的问题,请参考以下文章
基于多列值的具有重复键的两个大型 Pandas DataFrame 的条件合并/连接 - Python