如何在特定字符处拆分字符串并构建不同的字符串组合[关闭]
Posted
技术标签:
【中文标题】如何在特定字符处拆分字符串并构建不同的字符串组合[关闭]【英文标题】:How to split string at specific character and build different string combinations [closed] 【发布时间】:2021-02-14 04:04:35 【问题描述】:我想要处理的文本文件中有一些字符串。我尝试了许多正则表达式模式,但没有一个对我有用。
someone can tell/figure
a/the squeaky wheel gets the grease/oil
accounts for (someone or something)
that's/there's (something/someone) for you
我需要以下字符串组合:
someone can tell
someone can figure
a squeaky wheel gets the grease
a squeaky wheel gets the oil
the squeaky wheel gets the grease
the squeaky wheel gets the oil
accounts for someone
accounts for something
that's something for you
that's someone for you
there's something for you
there's someone for you
【问题讨论】:
请展示您的尝试并详细解释它是如何失败的。 我有一个类似的问题,我想使用/
或or
分割字符串,然后将它们与前面或相应的文本连接起来
@Mujtaba 这让你很可能有相同的家庭作业并且在同一个班级。
它需要对字符串的其余部分进行大量工作(如果它在句子的开头/结尾,如果它在空格或括号之间等),以便您获得最后的句子正确
【参考方案1】:
编辑:纠正了我错过上一版本的括号和“或”
一个简单的循环解决方案,也适用于多个斜线(他/她/它/随便):
def explode_versions(s):
match = re.search('^(.*?)(\S+)(?:(?:(?: or )|/)(\S+))+(.*?)$', s)
if match:
head, *versions, tail = match.groups()
versions[0] = re.sub('^\(', '', versions[0])
versions[-1] = re.sub('\)$', '', versions[-1])
return [line for v in versions for line in explode_versions(''.join([head, v, tail]))]
else:
return [s]
texts = ["someone can tell/figure",
"a/the squeaky wheel gets the grease/oil",
"accounts for (someone or something)",
"that's/there's (something/someone) for you"]
[explode_versions(text) for text in texts]
结果:
[['someone can tell', 'someone can figure'],
['a squeaky wheel gets the grease',
'a squeaky wheel gets the oil',
'the squeaky wheel gets the grease',
'the squeaky wheel gets the oil'],
['accounts for someone', 'accounts for something'],
["that's something for you",
"that's someone for you",
"there's something for you",
"there's someone for you"]]
【讨论】:
您仍然需要添加一些较小的更正,具体取决于 " 或 " 是否可以在不带括号的字符串中,等等。但这一切都高度依赖于数据。 先生,如何在没有括号的情况下在新行中打印每个结果 [ele for list_ in result for ele in list_] 当我从文本文件中读取相同的数据时,我得到TypeError: expected string or bytes-like object
。
如何读取数据?【参考方案2】:
您可以使用笛卡尔积:
from itertools import product
import re
s = 'a/the squeaky wheel gets the grease/oil'
lst = [i.split('/') for i in re.split(r'(\w+[\/\w+]+)', s) if i]
# [['a', 'the'], [' squeaky wheel gets the '], ['grease', 'oil']]
[''.join(i) for i in product(*lst)]
输出:
['a squeaky wheel gets the grease',
'a squeaky wheel gets the oil',
'the squeaky wheel gets the grease',
'the squeaky wheel gets the oil']
【讨论】:
它也适用于其他生产线吗?我也无法从文本文件中读取。 它应该适用于其他行。查找有关如何从文本文件中读取的问题。【参考方案3】:这有点棘手,但主要想法是在到达 \
时复制迄今为止的选项并跟踪其中的 2 个选项,看看这个:
m_str = ['someone can tell/figure',
'a/the squeaky wheel gets the grease/oil',
'accounts for (someone or something)',
'that\'s/there\'s (something/someone) for you']
lines = [[]]
for line in m_str:
options = [[]]
for word in line.split(" "):
if "/" in word:
new_options = []
for option in options:
new_options.append(option + [word.split("/")[0]])
new_options.append(option + [word.split("/")[1]])
options = new_options
# print(new_options)
# options = [m_func(options, item) for item in options]
else:
for option in options:
option.append(word)
lines.append(options)
print(lines[1:])
输出:
[[['someone', 'can', 'tell'], ['someone', 'can', 'figure']], [['a', 'squeaky', 'wheel', 'gets', 'the', 'grease'], ['a', 'squeaky', 'wheel', 'gets', 'the', 'oil'], ['the', 'squeaky', 'wheel', 'gets', 'the', 'grease'], ['the', 'squeaky', 'wheel', 'gets', 'the', 'oil']], [['accounts', 'for', '(someone', 'or', 'something)']], [["that's", '(something', 'for', 'you'], ["that's", 'someone)', 'for', 'you'], ["there's", '(something', 'for', 'you'], ["there's", 'someone)', 'for', 'you']]]
【讨论】:
【参考方案4】:这解决了/
和or
的问题。我没有考虑括号,但这应该很容易做到:)
lines = ["someone can tell/figure",
"a/the squeaky wheel gets the grease/oil",
"accounts for (someone or something)",
"that's/there's (something/someone) for you"]
result = []
for line in lines:
sequences = []
skip = False
if "/" in line or "or" in line:
words = line.split(" ")
temp_words = []
for index, word in enumerate(words):
if skip:
skip = False
continue
else:
if "/" in word:
options = word.split("/")
if len(sequences) > 0:
temp = []
for seq in sequences:
for opt in options:
temp.append(f"seq opt")
sequences = temp
else:
sequences = options
elif word == "or":
options = [words[index-1], words[index+1]]
if len(sequences) > 0:
temp = []
for seq in sequences:
for opt in options:
temp.append(f"' '.join(seq.split(' ')[:-1]) opt")
sequences = temp
skip = True
else:
sequences = options
else:
temp = []
if len(sequences) > 0:
for seq in sequences:
temp.append(f"seq word")
sequences = temp
else:
sequences.append(word)
else:
sequences = line
print(sequences)
输出:
['someone can tell', 'someone can figure']
['a squeaky wheel gets the grease', 'a squeaky wheel gets the oil', 'the squeaky wheel gets the grease', 'the squeaky wheel gets the oil']
['accounts for (someone', 'accounts for something)']
["that's (something for you", "that's someone) for you", "there's (something for you", "there's someone) for you"]
【讨论】:
【参考方案5】:我为你写了一个脚本。但我可以看到,我不是第一个。 我通过递归循环解决了你的问题。我只是选择一个带支架并将它们分开或一个'/'。我可以看到,与其他相比,它更复杂,但它也可以处理括号(之前只有 marcin 存档),我认为你可以快速适应它。
编辑:我删除了主文件,以便可以运行 ist 打开一个文本文件。python3 bobafitsscript.py textfile
#!/usr/bin/env python3
def sep_by_slash(word):
pos=word.index('/')
variation1=' '.join(wordlist[:i]+[]+wordlist[i+1:])
variation2=' '.join(wordlist[:i]+[]+wordlist[i+1:])
return (word[:pos],word[pos+1:])
def sep(s):
if '(' in s:
#veryfi if bracets are vorrect located.
try:
pos_start=s.index('(')
pos_end=s.index(')')
assert pos_start<pos_end
except:
print("ERROR: Mustake with brakets.")
return None
else:
str_start=s[:pos_start]
str_middle=s[pos_start+1:pos_end]
str_end=s[pos_end+1:]
if '/' in str_middle:
variants=[
str_start+possible_word+str_end
for possible_word in str_middle.split('/')
]
return [finite for variant in variants for finite in sep(variant)]
elif ' or ' in str_middle:
variants=[
str_start+possible_word+str_end
for possible_word in str_middle.split(' or ')
]
return [finite for variant in variants for finite in sep(variant)]
return [str_start+str_middle+str_end]
if '/' in s:
wordlist=s.split()
for i,word in enumerate(wordlist):
if '/' in word:
variants=[
' '.join(wordlist[:i]+[possible_word]+wordlist[i+1:])
for possible_word in word.split('/')
]
return [finite for variant in variants for finite in sep(variant)]
return [s]
def main(args):
with open(args[1], 'r') as textfile:
for line in textfile.readlines():
for var in sep(line.strip()):
print(var)
# ~ for line in sep(args[1]):
# ~ print(line)
return 0
def main_directly(args):
for line in sep(args[1]):
print(line)
return 0
if __name__ == '__main__':
import sys
sys.exit(main(sys.argv))
输出:
someone can tell
someone can figure
a squeaky wheel gets the grease
a squeaky wheel gets the oil
the squeaky wheel gets the grease
the squeaky wheel gets the oil
accounts for someone
accounts for something
that's something for you
there's something for you
that's someone for you
there's someone for you
【讨论】:
对不起,我是菜鸟,但是如何使用这个脚本。 可以保存,让python运行这个脚本。python3 bobafists-s-rcypt.py "I am the/a string."
或者您可以复制函数 sep 然后在 python 中运行 sep("I am the/a string.")
以获取所有可能字符串的列表。
但我想阅读一个包含数百个这样的单词的文本文件
那么你可以以某种方式解析它们。如果您使用 bash,请这样做:while read line;do python3 bobafists-s-rcypt.py "$line";done < yourtextfile.txt
在 python 中有一些方法可以做到这一点,您可以在 duckduckgo.com 上找到它们;-)
好的,我编辑了我的代码和答案,以便您可以直接从文件中读取。 :-)以上是关于如何在特定字符处拆分字符串并构建不同的字符串组合[关闭]的主要内容,如果未能解决你的问题,请参考以下文章