如果前面没有否定词,则正则表达式包含词[重复]

Posted

技术标签:

【中文标题】如果前面没有否定词,则正则表达式包含词[重复]【英文标题】:Regex contains words if no negative words are before it [duplicate] 【发布时间】:2020-12-27 00:52:12 【问题描述】:

我想抓取说 goodgreat 但未被 notisn't 否定的短语 之前

sents= ["good words",                   # Words after phrase
        "not good words",
        "isn't good words",

        "great words",
        "not great words",
        "isn't great words",



        "words good",                   # Words before phrase
        "words not good",
        "words isn't good"

        "words great",
        "words not great",
        "words isn't great"


        
        "words good words",             # Words before and after phrase
        "words not good words",
        "words isn't good words",

        "words great words",
        "words not great words",
        "words isn't great words",
]

我想回来

good words
words good
words good words

great words
words great
words great words

让我这样做的正则表达式是什么?从理论上讲,我希望能够有一个单词列表,只有当字符串不包含任何来自否定列表的单词时才能找到它。

【问题讨论】:

【参考方案1】:

您可以在 python 中将此正则表达式与 2 个否定的后向断言一起使用:

(?<!isn't )(?<!not )\b(?:good|great)\b

RegEx Demo

正则表达式详细信息:

(?&lt;!isn't ):如果我们有 isn't 后面跟一个空格,则负向后看会导致匹配失败 (?&lt;!not ):如果我们有 not 后面跟一个空格,则负向后看会导致匹配失败 \b:字边界 (?:good|great):匹配 goodgreat \b:字边界

代码:

>>> sents= ["good words",                   # Words after phrase
...         "not good words",
...         "isn't good words",
...         "great words",
...         "not great words",
...         "isn't great words",
...         "words good",                   # Words before phrase
...         "words not good",
...         "words isn't good",
...         "words great",
...         "words not great",
...         "words isn't great",
...         "words good words",             # Words before and after phrase
...         "words not good words",
...         "words isn't good words",
...         "words great words",
...         "words not great words",
...         "words isn't great words",
... ]
>>> reg = re.compile(r"(?<!isn't )(?<!not )\b(?:good|great)\b")
>>> for s in sents:
...     if reg.search(s):
...             print(s)
...
good words
great words
words good
words great
words good words
words great words

【讨论】:

【参考方案2】:

您需要使用 look behind,在本例中为 negative,因为也有 positive 的版本。你可以像这样简单地使用它:

(?<!not\s)great

在此示例中,单词 not 不能存在于 great 之前。

下面是它的样子:

(?<!not\s)(?<!isn't\s)(great|good)

Online Demo

【讨论】:

以上是关于如果前面没有否定词,则正则表达式包含词[重复]的主要内容,如果未能解决你的问题,请参考以下文章

python使用正则表达式去除句子中的重复词

Python正则表达式提取字符串的一部分

正则表达式不包含某些单词[重复]

mysql 正则表达式 查询匹配 某个词

正则表达式匹配词

正则表达式有助于从字符串中删除干扰词或停用词