Python：用 re.sub 替换列表中的多个特定单词

Posted 2023-02-23

技术标签:

【中文标题】Python：用 re.sub 替换列表中的多个特定单词【英文标题】：Python: Replacing multiple specific words from a list with re.sub 【发布时间】：2020-06-16 05:19:22 【问题描述】：

我有以下字符串并列出“changewords”。我想将 'word from list \n' 替换为 'word from list:' 我不想替换 '\n' 的所有实例。

string = "Foo \n value of something \n Bar \n Another value \n"
changewords = ["Foo", "Bar"]

期望的输出：

'Foo: value of something \n Bar: Another value \n'

我已经尝试了以下

for i in changewords:
    tem = re.sub(f'i \n', f'i:', string)
tem
Output: 'Foo \n value of something \n Bar: Another value \n'

和

changewords2 = '|'.join(changewords)
tem = re.sub(f'changewords2 \n', f'changewords2:', string)
tem
Output: 'Foo|Bar: \n value of something \n Foo|Bar: Another value \n'

我怎样才能得到我想要的输出？

【问题讨论】：

可能也想查看其他答案@RohanGupta。一些示例更多地显示了 reg expr 的意图可以做什么。否则，还不如直接使用"".replace()。 【参考方案1】：

您可以使用此代码：

import re

string = "Foo \n value of something \n Bar \n Another value \n"
changewords = ["foo", "Bar"]

tem = string
for i in changewords:
    tem = re.sub(f'(?i)i \n', f'i:', tem)
print( tem )

输出：

foo: value of something
 Bar: Another value

注意tem = string 初始化tem 值，然后在for 循环内使用re.sub on tem 并将返回结果分配给tem 本身。

(?i) 用于忽略大小写匹配。

Code Demo

【讨论】：

【参考方案2】：

使用替换字符串：

一种更优雅的方式。这个单行：

re.sub(rf"('|'.join(changewords)) \n", r"\1:", string, flags=re.I)

演示：

>>> string = "Foo \n value of something \n Bar \n Another value \n"
>>> changewords = ['Foo', 'Bar', 'Baz', 'qux']
>>> 
>>> re.sub(rf"('|'.join(changewords)) \n", r"\1:", string, flags=re.I)
'Foo: value of something \n Bar: Another value \n'
>>>

您可以使用flags 选项指定不区分大小写的匹配。并且替换字符串可以修改为需要\1 周围的任何内容，例如冒号或逗号。

值得注意的是，您可以在 Python 中为字符串添加多个说明符。例如，您可以同时拥有r 和f，例如rf"my raw formatted string" - 说明符的顺序并不重要。

在re.sub(expr, repl, string) 的表达式中，您可以指定组。通过在文本周围放置括号 () 来组成组。

然后可以在替换字符串repl 中引用组，方法是使用反斜杠及其出现次数 - 第一个组由\1 引用。

re.sub() 函数 re.sub(rf"(A|B|C) \n", r"\1: ") 将替换字符串中的 \1 与表达式参数中的第一组 (A|B|C) 相关联。

使用替换功能：

假设您想用字典中的其他单词替换目标字符串中的单词。例如，您希望将“Bar”替换为“Hank”，将“Foo”替换为“Bernard”。这可以使用替换函数而不是替换字符串来完成：

>>> repl_dict = 'Foo':'Bernard', 'Bar':'Hank'
>>> 
>>> expr = rf"('|'.join(repl_dict.keys())) \n"   # Becomes '(Foo|Bar) \\n'
>>>
>>> func = lambda mo: f"repl_dict[mo.group(1)]:"
>>> 
>>> re.sub(expr, func, string, flags=re.I)
'Bernard: value of something \n Hank: Another value \n'
>>>

这可能是另一种单行，但为了清楚起见，我将其分解...

lambda 函数所做的是获取匹配对象，mo 传递给它，然后提取第一组的文本。 reg expr 中的第一组是() 包含的文本，类似于(A|B|C)。

替换函数使用mo.group(1) 引用这第一组；同样，在前面的示例中，替换字符串由 \1 引用。

然后 repl 函数在 dict 中进行查找并返回匹配的最终替换字符串。

【讨论】：

单词后面可以加':'吗？替换后要冒号吗？是的。输出应该类似于“Foo: Value of something \n Bar: Another Value \n” re.sub 的第四个参数是count，第五个是flags for 循环和这种方法之间的细微差别是，对于忽略大小写匹配，它将输出Foo: value of something（因为原始文本有Foo）而循环显示foo: value of somethingfoo来自替换数组。【参考方案3】：

你完全可以不使用正则表达式，我的第一个方法是使用内置字符串函数.replace()，让它看起来像：

string = "Foo \n value of something \n Bar \n Another value \n"
changewords = ["Foo", "Bar"]

for word in changewords:
   to_replace = "0 \n".format(word)
   replacement = "0:".format(word)
   string = string.replace(to_replace, replacement)

希望对你有帮助！

【讨论】：

绝对有帮助！但是，我也想在我的 re.sub 函数中使用 re.IGNORECASE。是否可以将 IGNORECASE 与 .replace() 函数一起使用？

以上是关于Python：用 re.sub 替换列表中的多个特定单词的主要内容，如果未能解决你的问题，请参考以下文章