如何替换字符串的多个子串?
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何替换字符串的多个子串?相关的知识,希望对你有一定的参考价值。
我想使用.replace函数来替换多个字符串。
我现在有
string.replace("condition1", "")
但是想要有类似的东西
string.replace("condition1", "").replace("condition2", "text")
虽然那感觉不是很好的语法
这样做的正确方法是什么?有点像在grep / regex中你可以做\1
和\2
将字段替换为某些搜索字符串
这是一个简短的例子,应该使用正则表达式:
import re
rep = {"condition1": "", "condition2": "text"} # define desired replacements here
# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)
例如:
>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'
我需要一个解决方案,其中要替换的字符串可以是正则表达式,例如,通过用单个字符替换多个空白字符来帮助规范化长文本。基于其他人的一系列答案,包括MiniQuark和mmj,这就是我想出的:
def multiple_replace(string, reps, re_flags = 0):
""" Transforms string, replacing keys from re_str_dict with values.
reps: dictionary, or list of key-value pairs (to enforce ordering;
earlier items have higher priority).
Keys are used as regular expressions.
re_flags: interpretation of regular expressions, such as re.DOTALL
"""
if isinstance(reps, dict):
reps = reps.items()
pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
for i, re_str in enumerate(reps)),
re_flags)
return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)
它适用于其他答案中给出的示例,例如:
>>> multiple_replace("(condition1) and --condition2--",
... {"condition1": "", "condition2": "text"})
'() and --text--'
>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'
>>> multiple_replace("Do you like cafe? No, I prefer tea.",
... {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'
对我来说最重要的是你也可以使用正则表达式,例如仅替换整个单词,或者规范化空格:
>>> s = "I don't want to change this name:\n Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"
如果要将字典键用作普通字符串,则可以在使用例如字符串调用multiple_replace之前对其进行转义。这个功能:
def escape_keys(d):
""" transform dictionary d by applying re.escape to the keys """
return dict((re.escape(k), v) for k, v in d.items())
>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n Philip II of Spain"
以下函数可以帮助您在字典键中找到错误的正则表达式(因为来自multiple_replace的错误消息不是很有说服力):
def check_re_list(re_list):
""" Checks if each regular expression in list is well-formed. """
for i, e in enumerate(re_list):
try:
re.compile(e)
except (TypeError, re.error):
print("Invalid regular expression string "
"at position {}: '{}'".format(i, e))
>>> check_re_list(re_str_dict.keys())
请注意,它不会链接替换,而是同时执行它们。这样可以在不限制其功能的情况下提高效率。要模仿链接的效果,您可能只需要添加更多字符串替换对并确保对的预期排序:
>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
... ("but", "mut"), ("mutton", "lamb")])
'lamb'
启动Python 3.8
,并引入assignment expressions (PEP 572)(:=
运算符),我们可以在列表理解中应用替换:
# text = "The quick brown fox jumps over the lazy dog"
# replacements = [("brown", "red"), ("lazy", "quick")]
[text := text.replace(a, b) for a, b in replacements]
# text = 'The quick red fox jumps over the quick dog'
我建议代码应该是,例如:
z = "My name is Ahmed, and I like coding "
print(z.replace(" Ahmed", " Dauda").replace(" like", " Love" ))
它将按要求打印出所有更改。
你真的不应该这样做,但我觉得它太酷了:
>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>> cmd += ".replace(%s, %s)" %(k,v)
>>> exec(cmd)
现在,answer
是所有替补的结果
再次,这是非常hacky,并不是你应该经常使用的东西。但是如果你需要的话,知道你可以做这样的事情真是太好了。
这是一个在长字符串上更有效的示例,有许多小的替换。
source = "Here is foo, it does moo!"
replacements = {
'is': 'was', # replace 'is' with 'was'
'does': 'did',
'!': '?'
}
def replace(source, replacements):
finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
result = []
pos = 0
while True:
match = finder.search(source, pos)
if match:
# cut off the part up until match
result.append(source[pos : match.start()])
# cut off the matched part and replace it in place
result.append(replacements[source[match.start() : match.end()]])
pos = match.end()
else:
# the rest after the last match
result.append(source[pos:])
break
return "".join(result)
print replace(source, replacements)
重点是避免很多长串的连接。我们将源字符串剪切为片段,在我们构成列表时替换一些片段,然后将整个事物连接回字符串。
我不知道速度,但这是我的工作日快速修复:
reduce(lambda a, b: a.replace(*b)
, [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval)
, 'tomato' #The string from which to replace values
)
...但我喜欢上面的#1正则表达式答案。注意 - 如果一个新值是另一个的子字符串,则该操作不可交换。
或者只是为了快速入侵:
for line in to_read:
read_buffer = line
stripped_buffer1 = read_buffer.replace("term1", " ")
stripped_buffer2 = stripped_buffer1.replace("term2", " ")
write_to_file = to_write.write(stripped_buffer2)
以下是使用字典执行此操作的另一种方法:
listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print " ".join(modify[x] for x in listA)
从Andrew的宝贵答案开始,我开发了一个脚本,该脚本从文件加载字典并详细说明打开的文件夹上的所有文件以进行替换。该脚本从外部文件加载映射,您可以在其中设置分隔符。我是初学者,但我发现这个脚本在多个文件中进行多次替换时非常有用。它在几秒钟内加载了一个包含1000多个条目的字典。它不优雅,但它对我有用
import glob
import re
mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")
rep = {} # creation of empy dictionary
with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
for line in temprep:
(key, val) = line.strip('\n').split(sep)
rep[key] = val
for filename in glob.iglob(mask): # recursion on all the files with the mask prompted
with open (filename, "r") as textfile: # load each file in the variable text
text = textfile.read()
# start replacement
#rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[m.group(0)], text)
#write of te output files with the prompted suffice
target = open(filename[:-4]+"_NEW.txt", "w")
target.write(text)
target.close()