如果前面没有否定词,则正则表达式包含词[重复]
Posted
技术标签:
【中文标题】如果前面没有否定词,则正则表达式包含词[重复]【英文标题】:Regex contains words if no negative words are before it [duplicate] 【发布时间】:2020-12-27 00:52:12 【问题描述】:我想抓取说 good 或 great 但未被 not 或 isn't 否定的短语 之前。
sents= ["good words", # Words after phrase
"not good words",
"isn't good words",
"great words",
"not great words",
"isn't great words",
"words good", # Words before phrase
"words not good",
"words isn't good"
"words great",
"words not great",
"words isn't great"
"words good words", # Words before and after phrase
"words not good words",
"words isn't good words",
"words great words",
"words not great words",
"words isn't great words",
]
我想回来
good words
words good
words good words
great words
words great
words great words
让我这样做的正则表达式是什么?从理论上讲,我希望能够有一个单词列表,只有当字符串不包含任何来自否定列表的单词时才能找到它。
【问题讨论】:
【参考方案1】:您可以在 python 中将此正则表达式与 2 个否定的后向断言一起使用:
(?<!isn't )(?<!not )\b(?:good|great)\b
RegEx Demo
正则表达式详细信息:
(?<!isn't )
:如果我们有 isn't
后面跟一个空格,则负向后看会导致匹配失败
(?<!not )
:如果我们有 not
后面跟一个空格,则负向后看会导致匹配失败
\b
:字边界
(?:good|great)
:匹配 good
或 great
\b
:字边界
代码:
>>> sents= ["good words", # Words after phrase
... "not good words",
... "isn't good words",
... "great words",
... "not great words",
... "isn't great words",
... "words good", # Words before phrase
... "words not good",
... "words isn't good",
... "words great",
... "words not great",
... "words isn't great",
... "words good words", # Words before and after phrase
... "words not good words",
... "words isn't good words",
... "words great words",
... "words not great words",
... "words isn't great words",
... ]
>>> reg = re.compile(r"(?<!isn't )(?<!not )\b(?:good|great)\b")
>>> for s in sents:
... if reg.search(s):
... print(s)
...
good words
great words
words good
words great
words good words
words great words
【讨论】:
【参考方案2】:您需要使用 look behind,在本例中为 negative,因为也有 positive 的版本。你可以像这样简单地使用它:
(?<!not\s)great
在此示例中,单词 not
不能存在于 great
之前。
下面是它的样子:
(?<!not\s)(?<!isn't\s)(great|good)
Online Demo
【讨论】:
以上是关于如果前面没有否定词,则正则表达式包含词[重复]的主要内容,如果未能解决你的问题,请参考以下文章