如何根据 Pandas 中的列表过滤 DataFrame 中的项目?

Posted

技术标签:

【中文标题】如何根据 Pandas 中的列表过滤 DataFrame 中的项目?【英文标题】:How to filter the items in a DataFrame based on a list in pandas? 【发布时间】:2021-01-14 18:41:34 【问题描述】:

我是编码新手,正在尝试处理以下数据:

df=
    Position
    A/C MECHANIC
    A/C TECHNICIAN
    A/C TECHNICIAN HELPER
    ACCOUNTANT
    ACCOUNTANT MANAGER
    ACCOUNTING CLERK
    ACCOUNTS AUDITOR
    ACCOUNTS MANAGER
    ACCOUNTS SUPERVISOR
    ACTING HOSPITAL ADMINISTRATOR
    ADMINISTRATION SECRETARY
    ADMINISTRATIVE  SUPERVISOR
    ADMINISTRATIVE CLERK
    ADMINISTRATIVE COORDINATOR
    ADMINISTRATIVE DIRECTOR
    ADMINISTRATIVE MANAGER
    ADMINISTRATOR OF MED.INSURANCE
    ADMINSTRATION OFFICE MANAGER
    ADMISSION COUNTER CLERK
    ADMISSION OFFICER

我有以下清单:

name=['TECHNICIAN', 'MANAGER', 'CLERK', 'AUDITOR', 'SUPERVISOR', 'SECRETARY', 'COORDINATOR', 'DIRECTOR', 'OFFICER', 'SPECIALIST', 'PROGRAMMER', 'TYPIST', 'LIASON', 'DESIGNER', 'ENGINEER', 'ACCOUNTANT', 'ADMINISTRATOR', 'BAKER', 'COOK']

我正在尝试创建一个新的数据框,它从上述列表中获取值,找到包含该单词的相应位置,然后将其添加到新数据框中的列中。

这是我正在使用的代码。

newdf=pd.DataFrame()
for i in name:
  print(i)
  newdf[i]=df[df['position'].str.contains(i)]

我正在尝试将每个过滤后的值添加到“newdf”中的新列中。

当我运行上面的代码时,我收到了这个错误:

ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

我正在尝试获得以下输出:

TECHNICIAN,              MANAGER,
A/C TECHNICIAN           ACCOUNTANT MANAGER
ALUMINUM TECHNICIAN      ACCOUNTS MANAGER
ANAESTHESIA TECHNICIAN   ADMINISTRATIVE MANAGER
APPLIANCE TECHNICIAN    
BIOMEDICAL SENIOR   
BIOMEDICAL TECHNICIAN   
BOILER TECHNICIAN   
COMPUTER TECHNICIAN 
COMPUTER TECHNICIAN 
COMPUTER TECHNICIAN

【问题讨论】:

请添加预期输出。 @HenryYik 我添加了预期的输出。谢谢你告诉我。 【参考方案1】:

创建DataFrames 的字典并传递给concat

dfs = i: df.loc[df['Position'].str.contains(i), 'Position'].reset_index(drop=True) 
           for i in name
newdf = pd.concat(dfs, axis=1)

print (newdf)
                 TECHNICIAN                           MANAGER  \
0             A/C TECHNICIAN                ACCOUNTANT MANAGER   
1      A/C TECHNICIAN HELPER                  ACCOUNTS MANAGER   
2                        NaN            ADMINISTRATIVE MANAGER   
3                        NaN      ADMINSTRATION OFFICE MANAGER   

                         CLERK               AUDITOR  \
0             ACCOUNTING CLERK      ACCOUNTS AUDITOR   
1         ADMINISTRATIVE CLERK                   NaN   
2      ADMISSION COUNTER CLERK                   NaN   
3                          NaN                   NaN   

                       SUPERVISOR                     SECRETARY  \
0             ACCOUNTS SUPERVISOR      ADMINISTRATION SECRETARY   
1      ADMINISTRATIVE  SUPERVISOR                           NaN   
2                             NaN                           NaN   
3                             NaN                           NaN   

                      COORDINATOR                     DIRECTOR  \
0      ADMINISTRATIVE COORDINATOR      ADMINISTRATIVE DIRECTOR   
1                             NaN                          NaN   
2                             NaN                          NaN   
3                             NaN                          NaN   

                 OFFICER SPECIALIST PROGRAMMER TYPIST LIASON DESIGNER  \
0      ADMISSION OFFICER        NaN        NaN    NaN    NaN      NaN   
1                    NaN        NaN        NaN    NaN    NaN      NaN   
2                    NaN        NaN        NaN    NaN    NaN      NaN   
3                    NaN        NaN        NaN    NaN    NaN      NaN   

  ENGINEER              ACCOUNTANT                       ADMINISTRATOR BAKER  \
0      NaN              ACCOUNTANT       ACTING HOSPITAL ADMINISTRATOR   NaN   
1      NaN      ACCOUNTANT MANAGER      ADMINISTRATOR OF MED.INSURANCE   NaN   
2      NaN                     NaN                                 NaN   NaN   
3      NaN                     NaN                                 NaN   NaN   

  COOK  
0  NaN  
1  NaN  
2  NaN  
3  NaN  

newdf.to_csv('file.csv', index=False)

【讨论】:

@jazrael 当我使用它时,它不会过滤任何东西,而是再次给我“df”。 我刚刚用我想要得到的输出更新了我的问题 我尝试了这两个代码不幸的是我得到了“df”作为我的输出 这个作品的伙伴。但是您可以帮助获取新的 DataFrame,而不是创建新列表。 太棒了.. 谢谢 aLOOOTT

以上是关于如何根据 Pandas 中的列表过滤 DataFrame 中的项目?的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 python 或 pandas 根据包含字典列表的列过滤 DataFrame?

Pandas:如何从给定(行,列)对列表的 DataFrame 中检索值?

如何根据列表有条件地更新 Pandas 中的 DataFrame 列

如何根据有序列表替换pandas dataframe列中的元素?

根据 Pandas 中的条件过滤行

如何使用 SQL 中的“in”和“not in”过滤 Pandas 数据帧