Python Regex:在多行上匹配一个字符?

Posted

技术标签:

【中文标题】Python Regex:在多行上匹配一个字符?【英文标题】:Python Regex: Matching a character on multi line? 【发布时间】:2013-07-04 21:05:26 【问题描述】:

我正在 pythonchallenge.com 上进行挑战,但我在使用常规正则表达式时遇到了问题。

例如,如果我们有以下文本:

hello world
<!--
%%$@_$^__#)^)&!_+]!*@&^@[@%]()%+$&[(_@%+%$*^@$^!+]!&_#)_*!_]$[%@[_@#_^*
@##&#&&)*%(]([*@[@&]+!!*)!%+))])[!^)+)$]#*+^((@^@$[*a*$&^$!@#$%)!@(&bc  

我想将字符 a 和 b 和 c 放入字符串(来自上面的字符串)(但不是 hello world)我该怎么做?

我知道我可以在 python 中执行以下操作:

x = "".join(re.findall("regex", data))

但是,我遇到了正则表达式的问题。我正在正则表达式测试器上对其进行测试,但它似乎并没有做我想做的事情

这是我的正则表达式

<!--[a-z]*

据我了解,(在阅读 regex-expression.info 教程后)这个表达式应该找到指定字符串之后的所有字符:输出 abc

但是,这不起作用。据我了解,这也不是特殊字符,因为它不是 [\^$.|?*+().

我怎样才能使这个正则表达式按我想要的方式工作?包含 abc 但不包含 hello world?

【问题讨论】:

" 【参考方案1】:
import re

su = '''hello world
xxxx hello world yyyy
<!--
_+]!yuyu*@&^@?!hello world[@%]^@$[*a*$&^!@(&bc??,=hello'''

print su

pat = '([a-z]+)(?![a-z])(?<!world)'
print "\nexcluding all the words 'world'\n%s" % pat
print re.findall(pat,su)

pat = '([a-z]+)(?![a-z])(?<!\Ahello world)'
print "\nexcluding the word 'world' of the starting string 'hello world'\n%s" % pat
print re.findall(pat,su)

pat = '([a-z]+)(?![a-z])(?<!hello world)'
print "\nexcluding all the words 'world' of a string 'hello world'\n%s" % pat
print re.findall(pat,su)

print '\n-----------'

pat = '([a-z]+)(?![a-z])(?<!hello)'
print "\nexcluding all the words 'hello'\n%s" % pat
print re.findall(pat,su)

pat = '([a-z]+)(?![a-z])(?<!\Ahello)'
print "\nexcluding the starting word 'hello'\n%s" % pat
print re.findall(pat,su)

pat = '([a-z]+)(?![a-z])(?<!hello(?= world))'
print "\nexcluding all the words 'hello' of a string 'hello world'\n%s" % pat
print re.findall(pat,su)

print '\n-----------'

pat = '([a-z]+)(?![a-z])(?<!hello|world)'
print "\nexcluding all the words 'hello' and 'world'\n%s" % pat
print re.findall(pat,su)

pat = '([a-z]+)(?![a-z])(?<!hello(?= world))(?<!hello world)'
print "\nexcluding all the words of a string 'hello world'\n%s" % pat
print re.findall(pat,su)

pat = '([a-z]+)(?![a-z])(?<!\Ahello(?= world))(?<!\Ahello world)'
print "\nexcluding all the words of the starting string 'hello world'\n%s" % pat
print re.findall(pat,su)

结果

hello world
xxxx hello world yyyy
<!--
_+]!yuyu*@&^@?!hello world[@%]^@$[*a*$&^!@(&bc??,=hello

excluding all the words 'world'
([a-z]+)(?![a-z])(?<!world)
['hello', 'xxxx', 'hello', 'yyyy', 'yuyu', 'hello', 'a', 'bc', 'hello']

excluding the word 'world' of the starting string 'hello world'
([a-z]+)(?![a-z])(?<!\Ahello world)
['hello', 'xxxx', 'hello', 'world', 'yyyy', 'yuyu', 'hello', 'world', 'a', 'bc', 'hello']

excluding all the words 'world' of a string 'hello world'
([a-z]+)(?![a-z])(?<!hello world)
['hello', 'xxxx', 'hello', 'yyyy', 'yuyu', 'hello', 'a', 'bc', 'hello']

-----------

excluding all the words 'hello'
([a-z]+)(?![a-z])(?<!hello)
['world', 'xxxx', 'world', 'yyyy', 'yuyu', 'world', 'a', 'bc']

excluding the starting word 'hello'
([a-z]+)(?![a-z])(?<!\Ahello)
['world', 'xxxx', 'hello', 'world', 'yyyy', 'yuyu', 'hello', 'world', 'a', 'bc', 'hello']

excluding all the words 'hello' of a string 'hello world'
([a-z]+)(?![a-z])(?<!hello(?= world))
['world', 'xxxx', 'world', 'yyyy', 'yuyu', 'world', 'a', 'bc', 'hello']

-----------

excluding all the words 'hello' and 'world'
([a-z]+)(?![a-z])(?<!hello|world)
['xxxx', 'yyyy', 'yuyu', 'a', 'bc']

excluding all the words of a string 'hello world'
([a-z]+)(?![a-z])(?<!hello(?= world))(?<!hello world)
['xxxx', 'yyyy', 'yuyu', 'a', 'bc', 'hello']

excluding all the words of the starting string 'hello world'
([a-z]+)(?![a-z])(?<!\Ahello(?= world))(?<!\Ahello world)
['xxxx', 'hello', 'world', 'yyyy', 'yuyu', 'hello', 'world', 'a', 'bc', 'hello']

如果您只想在分析字符串中的某个模式之后捕获:

print su

print "\ncatching all the lettered strings after <!--"
print "re.compile('^.+?<!--|([a-z]+)',re.DOTALL)"
rgx = re.compile('^.+?<!--|([a-z]+)',re.DOTALL)
print [x.group(1) for x in rgx.finditer(su) if x.group(1)]

print ("\ncatching all the lettered strings after <!--\n"
       "excluding all the words 'world'")
print "re.compile('^.+?<!--|([a-z]+)(?<!world)',re.DOTALL)"
rgx = re.compile('^.+?<!--|([a-z]+)(?![a-z])(?<!world)',re.DOTALL)
print [x.group(1) for x in rgx.finditer(su) if x.group(1)]

print ("\ncatching all the lettered strings after <!--\n"
       "excluding all the words 'hello'")
print "re.compile('^.+?<!--|([a-z]+)(?<!hello)',re.DOTALL)"
rgx = re.compile('^.+?<!--|([a-z]+)(?![a-z])(?<!hello)',re.DOTALL)
print [x.group(1) for x in rgx.finditer(su) if x.group(1)]

print ("\ncatching all the lettered strings after <!--\n"
       "excluding all the words 'hello' belonging to a string 'hello world'")
print "re.compile('^.+?<!--|([a-z]+)(?<!hello(?= world))',re.DOTALL)"
rgx = re.compile('^.+?<!--|([a-z]+)(?![a-z])(?<!hello(?= world))',re.DOTALL)
print [x.group(1) for x in rgx.finditer(su) if x.group(1)]

结果

hello world
xxxx hello world yyyy
<!--
_+]!yuyu*@&^@?!hello world[@%]^@$[*a*$& <!-- ^!@(&bc??,=hello

catching all the lettered strings after first <!--
re.compile('.+?<!--|([a-z]+)',re.DOTALL)
['yuyu', 'hello', 'world', 'a', 'bc', 'hello']

catching all the lettered strings after first <!--
excluding all the words 'world'
re.compile('.+?<!--|([a-z]+)(?<!world)',re.DOTALL)
['yuyu', 'hello', 'a', 'bc', 'hello']

catching all the lettered strings after first <!--
excluding all the words 'hello'
re.compile('.+?<!--|([a-z]+)(?<!hello)',re.DOTALL)
['yuyu', 'world', 'a', 'bc']

catching all the lettered strings after first <!--
excluding all the words 'hello' belonging to a string 'hello world'
re.compile('.+?<!--|([a-z]+)(?<!hello(?= world))',re.DOTALL)
['yuyu', 'world', 'a', 'bc', 'hello']

【讨论】:

【参考方案2】:
>>> import re
>>> print strs = """hello world
<!--
%%$@_$^__#)^)&!_+]!*@&^@[@%]()%+$&[(_@%+%$*^@$^!+]!&_#)_*!_]$[%@[_@#_^*
@##&#&&)*%(]([*@[@&]+!!*)!%+))])[!^)+)$]#*+^((@^@$[*a*$&^$!@#$%)!@(&bc"""
>>> re.findall(r'[a-zA-Z]+',strs.split('<!--')[-1])
['a', 'bc']

【讨论】:

以上是关于Python Regex:在多行上匹配一个字符?的主要内容,如果未能解决你的问题,请参考以下文章

python regex - 匹配两个字符串

使用powershell“switch -regex -file”,如何在默认块上获取“不匹配”字符串?

python 使用Regex查找字符串中的匹配项

Python Pandas Regex:在列中搜索带有通配符的字符串并返回匹配项[重复]

Python: 正则表达式匹配多行,实现多行匹配模式

Python Regex