在文本文件中搜索和替换字符串

Posted

技术标签:

【中文标题】在文本文件中搜索和替换字符串【英文标题】:Search and Replace a string in text file 【发布时间】:2020-05-12 20:01:41 【问题描述】:

我想在包含字符串“SECTION=C-BEAM”的文本文件中搜索一行,并通过从第一行读取模式替换“下一行”中的前 13 个字符(模式以粗体突出显示(参见下面的示例 - 从第一行读取 1.558 并将其替换为第二行中的 1.558/2 =0.779)。从第一行读取的数字始终位于字符串“H_”和“H_0”之间。

示例输入:

SECTION, ELSET=DIORH_1_558H_0_76W_241_1, SECTION=C-BEAM, MAT=XYZ;
0., 1,  2,  3,  4,  5

输出如下:

SECTION, ELSET=DIORH_1_558H_0_76W_241_1, SECTION=C-BEAM, MAT=XYZ;
0.779,  1,  2,  3,  4,  5

这是我迄今为止尝试过的。

file_in = open(test_input, 'rb')
file_out = open(test_output, 'wb')
lines = file_in.readlines()
print ("Total no. of lines to process: ", len(lines))
for i in range(len(lines)):
    if lines.startswith("SECTION") and "SECTION=C-BEAM" in lines:
    start_index = lines.find("H_")+1
    end_index = lines.find("H_0")
    x = lines[start_index:end_index]/2.0
    print (x)
    lines[i+1]= lines[i+1].replace("          0.",x)+lines[i+1][13:]
file_out.write(lines[i])
file_in.close()
file_out.close()

【问题讨论】:

到目前为止你尝试过什么?你被困在哪里了? 【参考方案1】:

正如您提到的内容驻留在文件中,我尝试将其他一些随机行存储在字符串中,而不是您正在寻找的模式。 在下面的一段代码中进行了测试,它可以工作。我假设文件中只有一个这样的事件。如果文件中有多个事件可以通过循环完成。

import re

st = '''These are some different lines - you need not worry about.
SECTION, ELSET=DIORH_1_558H_0_76W_241_1, SECTION=C-BEAM, MAT=XYZ;
0., 1,  2,  3,  4,  5
These are more different lines - you need not worry about.
0.,2 numbers'''

num = str(float(re.findall('.*H_(.+)H_0.*SECTION=C-BEAM.*\n.*',st)[0].replace("_","."))/2)
print (re.sub(r'(.*SECTION=C-BEAM.*\n)(0\.)(,.*)',r'\g<1>'+num+r'\g<3>',st))

# re.findall('.*H_(.*)H_0.*SECTION=C-BEAM.*\n.*',st) --> Returns ['1_558']. Extract 1_558 by indexing it -[0] 
# Then replace "_" with "." Convert to a float, divide by 2 and then convert the result to string again
# .* means 0 or more non-newline characters,.+ means 1 or more non-newline characters "\n" stands for new line. 
# (.+) means characters inside the bracket from the overall pattern will be extracted
# Second line of the code: I replaced the desired number("0.") for the matching patternin the second line. 
# Divided the pattern in to 3 groups: 1) Before the pattern  "0." 2) The pattern "0." itself 3) After the pattern "0.". 
# Replaced the pattern "0." with "group 1 + num + group 2"  

输出如下图:

【讨论】:

您好,谢谢您的回复,num 的计算似乎有错别字。代码第一行中的 st)[0] 是什么? 否 - 这不是错字 - 它的目的是:(re.findall('.*H_(.*)H_0.*SECTION=C-BEAM.*\n.*',st)生成了一个列表。我在这里使用索引获取第一个元素。(即使只有一个元素)-您可以在 Jupyter 或其他工具中测试代码并告诉我它是否有效。 @user2001139 :编辑代码:添加详细说明和输出图片。【参考方案2】:

基本的 python 正则表达式应该这样做:

my_text = """SECTION, ELSET=DIORH_1_558H_0_76W_241_1, SECTION=C-BEAM, MAT=XYZ;\n0., 1,  2,  3,  4,  5"""

# This find the index of the first occurence of your regex in my_text
index = my_text.find('SECTION=C-BEAM')

# You select everything before the first occurence of your regex 
# and count the number of lines (\n is the escape line character)
nb_line = my_text[:index].count('\n')

# Now you wand to find the index of the beginning of the n + 1 line. 
# You can do this thanks to finditer function
# This creates the list of index of a specified regex, 
# you select the n + 1 (here it is nb_line because python indexing starts at 0)
index = [m.start() for m in re.finditer(r"\n",my_text)][nb_line]

# the you re build the wanted string with :
# the beginning of your string until the n + 1 line,
# the text you want (0.779) 
# the text after the substring you removed (you need to know the length of the string you want to remove here 2

string_to_remove = "0."
my_text = my_text[:index+1] + '0.779' + my_text[index + 1 + len(string_to_remove):]

print(my_text)

【讨论】:

我尝试了以下,发现两个错误,不知道 Startswith 方法有什么问题。你能帮助路易斯吗,因为我的技能相当有限。 for i in range(len(lines)): if lines.startswith("SECTION") and "SECTION=C-BEAM" in lines: start_index = lines.find("H_")+1 end_index = lines.find(" H_0") x = lines[start_index:end_index]/2.0 print (x)lines[i+1]=lines[i+1].replace("0.",x)+lines[i+1][13: ] file_out.write(lines[i]) file_in.close() file_out.close()

以上是关于在文本文件中搜索和替换字符串的主要内容,如果未能解决你的问题,请参考以下文章

如何使用python搜索和替换DOTM文件中的字符串

Perl:在多个文本文件中查找和替换特定字符串

sh 搜索文本文件中的字符串并替换它们

搜索和替换字符串中的文本

PowerShell5。查找/替换文本。 Switch和.NET框架或cmdlet和管道?哪个更快?哪个更容易阅读?

linux文本替换,将文本b中内容替换到文本a中指定字符串之间