在多行字符串中获取单词'print'的索引

Posted

技术标签:

【中文标题】在多行字符串中获取单词\'print\'的索引【英文标题】:Getting the index of the word 'print' in a multiline string在多行字符串中获取单词'print'的索引 【发布时间】:2021-05-03 22:42:11 【问题描述】:

我正在尝试在多行文本中查找所有单词的索引:'print'。但是也有一些问题,那就是:

    如果一行中有两次打印,代码将两次返回单词“print”的索引相同。 无法在同一行中找到第二个“print”的索引,而是将第一个“print”的索引打印了两次。 我的代码是:
text = '''print is print as
it is the function an
print is print and not print
'''

text_list = []

for line in text.splitlines():

    #'line' represents each line in the multiline string
    text_list.append([])

    for letter in line:
        #Append the letter of each line in a list inside the the text_list
        text_list[len(text_list)-1].append(letter)

for line in text_list:
    for letter in line:

        #check if the letter is after 'p' is 'r' and after that 'i' and then 'n' and at last 't'
        if letter == "p":
            num = 1

            if text_list[text_list.index(line)][line.index(letter)+num] == 'r':
                num += 1
                
                if text_list[text_list.index(line)][line.index(letter)+num] == 'i':
                    num += 1

                    if text_list[text_list.index(line)][line.index(letter)+num] == 'n':
                        num += 1

                        if text_list[text_list.index(line)][line.index(letter)+num] == 't':
                            num += 1
                            print(f'index (start,end) = text_list.index(line).line.index(letter), text_list.index(line).line.index(letter)+num')
                        

当我运行它时打印:

index (start,end) = 0.0, 0.5 #returns the index of the first 'print' in first line
index (start,end) = 0.0, 0.5 #returns the index of the first 'print' in first line instead of the index of the second print
index (start,end) = 2.0, 2.5 #returns the index of the first 'print' in third line
index (start,end) = 2.0, 2.5 #returns the index of the first 'print' in third line instead of the index of the second print
index (start,end) = 2.0, 2.5 #returns the index of the first 'print' in third line instead of the index of the third print

您可以看到,在结果中,索引是重复的。这是text_list

>>> text_list
[['p', 'r', 'i', 'n', 't', ' ', 'i', 's', ' ', 'p', 'r', 'i', 'n', 't', ' ', 'a', 's'],
['i', 't', ' ', 'i', 's', ' ', 't', 'h', 'e', ' ', 'f', 'u', 'n', 'c', 't', 'i', 'o', 'n', ' ', 'a', 'n'],
['p', 'r', 'i', 'n', 't', ' ', 'i', 's', ' ', 'p', 'r', 'i', 'n', 't', ' ', 'a', 'n', 'd', ' ', 'n', 'o', 't', ' ', 'p', 'r', 'i', 'n', 't']]
>>>

text_list 中的每个list 都是text 中的一行。一共有三行,所以 text_list 中有三个 list。如何获取第一行中第二个“打印”的索引以及第三行中第二个和第三个“打印”的索引?你可以看到它只返回第一行和第三行中第一个'print'的索引。

【问题讨论】:

【参考方案1】:
import re

text = '''print is print as
it is the function an
print is print and not print
'''

for line_number, line in enumerate(text.split('\n')):
    occurrences = [m.start() for m in re.finditer('print', line)]

    if occurrences:
        for occurrence in occurrences:
            print('Found `print` at character %d on line %d' % (occurrence, line_number + 1))

->

Found `print` at character 0 on line 1
Found `print` at character 9 on line 1
Found `print` at character 0 on line 3
Found `print` at character 9 on line 3
Found `print` at character 23 on line 3

【讨论】:

谢谢,我不知道有一个函数叫做“枚举”。谢谢,效果很好。【参考方案2】:

strings 已经有一个 index 方法来查找子字符串,您可以提供额外的参数来查找给定子字符串的下一个副本的下一个副本

>>> text = '''print is print as
it is the function an
print is print and not print
'''
>>> text.index("print")
0
>>> text.index("print",1)
9
>>> text.index("print",10)
40
>>> text.index("print",41)
49
>>> text.index("print",50)
63
>>> text.index("print",64)
Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    text.index("print",64)
ValueError: substring not found
>>> 

【讨论】:

【参考方案3】:

你可以使用正则表达式:

import re

text = '''print is print as
it is the function an
print is print and not print
'''

for i in re.finditer("print", text):
    print(i.start())

# OR AS A LIST

[i.start() for i in re.finditer("print", text)]

【讨论】:

【参考方案4】:

你最初是在正确的轨道上。您将文本分成几行。下一步是使用 split() 方法将每一行拆分为单词,而不是字母。然后,您可以轻松获取每行中每个“打印”字符串的索引。

以下代码将所需索引打印为列表列表,每个内部列表对应一个单独的行:

text = '''print is print as
it is the function an
print is print and not print
'''

index_list = []
for line in text.splitlines():
    index_list.append([])
    for idx, word in enumerate(line.split()):
        if word == 'print':
            index_list[-1].append(idx)

print(index_list)

#[[0, 2], [], [0, 2, 5]]

【讨论】:

以上是关于在多行字符串中获取单词'print'的索引的主要内容,如果未能解决你的问题,请参考以下文章

多行文本框:在状态栏中显示当前行/字符索引

通过 Kinetic.js 文本元素中的字符串索引获取单词/字符屏幕位置

Swift:获取String中单词的开头和结尾字符的索引

如何在 BigQuery SQL 中将字符串列拆分为多行单个单词和单词对?

字符串

如何在C++中 统计多行文本中的行数、单词数及字符数