如何删除列表中出现的指定单词

Posted 2023-03-12

技术标签:

【中文标题】如何删除列表中出现的指定单词【英文标题】：how can i delete specified words that occur in list 【发布时间】：2020-09-01 08:14:46 【问题描述】：

我有一个数据框，在名为“original_column”的第一列中有文本。我已经成功地从带有列表的文本列“original_column”中挑选出特定的单词，并将它们附加到另一列并使用以下代码从原始列中删除：

list1 = ’text’ , ‘and’ , ‘example’

finder = lambda x: next(iter([y for y in x.split() if y in list1]), None)

df['list1'] = df.original_column.apply(finder)

df['original column']=df['original column'].replace(regex=r'(?i)'+ df['list1'],value="")

我现在想在此代码的基础上构建，在将列出的单词附加到新列之后，能够从“original_column”中删除列表中特定单词的第一个实例。数据框目前如下所示：

|   original column  |
__________________________
|   text text word   | 
--------------------------
|    and other and   |

我当前的代码输出如下：

|   original column   | list1
______________________________
|        word         | text
------------------------------
|        other        |  and

我想输出这个：

|   original column   | list1
_______________________________
|      text word      | text
-------------------------------
|      other and      |  and

【问题讨论】：

【参考方案1】：

让我们做replace

df['original column']=df['original column'].replace(regex=r'(?i)'+ df['list1'],value="")
df
Out[101]: 
  original column list1
0      text text   word
1      text  text   and

【讨论】：

【参考方案2】：

假设给定的数据框为：

df = pd.DataFrame("original_column": ["text text word", "text and text"])

用途：

import re

pattern = '|'.join(f"\s*item\s*" for item in list1)
regex = re.compile(pattern)

def extract_words(s):
    s['list1'] = ' '.join(map(str.strip, regex.findall(s['original_column'])))
    s['original_column'] = regex.sub(' ', s['original_column']).strip()
    return s

df = df.apply(extract_words, axis=1)
print(df)

这导致数据框df 为：

  original_column list1
0       text text  word
1       text text   and

【讨论】：

以上是关于如何删除列表中出现的指定单词的主要内容，如果未能解决你的问题，请参考以下文章