飘逸的python - 实现glob style pattern

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了飘逸的python - 实现glob style pattern相关的知识,希望对你有一定的参考价值。

一说起通配符,大家非常快就会想起*和?

号,有了通配符,使得表达能力大大增强,非常多linux命令都支持这个东西,事实上就是glob style pattern.
就连redis的keys命令都支持glob.

我要实现的glob,支持下面特性:

  • 星号*匹配0个或多个随意字符
  • ?

    匹配确切的一个随意字符

  • [characters]匹配随意一个方括号内的字符,比方[abc],要么匹配a,要么匹配b,要么匹配c.
  • [!character]排除方括号内的字符
  • [character-character],表示2个字符范围内的都能够匹配,如[a-z],[0-9]

实现这个东西事实上挺简单的,从左往右扫描s串和p串,假设最后都走到了结尾,那么就是能够匹配的.
主要难点在于*号的匹配.由于*号能够匹配0个或者多个,所以须要试探回溯.这里通过保存*号位置,假设后面的走不通了,就拉回*号位置,贪婪匹配.

至于方括号的展开,弄个include和exclude变量就非常清晰了.


以下上代码.

#coding=utf-8
def build_expand(p):#方括号展开
    ptr2include = {}
    ptr2exclude = {}
    ptr2next = {}
    len_p = len(p)
    pPtr = 0
    while pPtr<len_p:
        if p[pPtr] == ‘[‘:
            start = pPtr
            pPtr += 1
            include = set([])
            exclude = set([])
            while p[pPtr]!=‘]‘:
                if p[pPtr]==‘!‘:
                    exclude.add(p[pPtr+1])
                    pPtr += 2
                elif p[pPtr+1] == ‘-‘:
                    include.update({chr(x) for x in range(ord(p[pPtr]),ord(p[pPtr+2])+1)})
                    pPtr += 3
                else:
                    include.add(p[pPtr])
                    pPtr += 1
            if include:
                ptr2include[start] = include
            if exclude:
                ptr2exclude[start] = exclude
            ptr2next[start] = pPtr + 1
        else:
            pPtr += 1
    return ptr2include, ptr2exclude, ptr2next

def isMatch(s, p):
    len_s = len(s); len_p = len(p)
    sPtr = pPtr = ss = 0
    star = None
    ptr2include, ptr2exclude, ptr2next = build_expand(p)
    while sPtr<len_s:
        if pPtr<len_p and (p[pPtr] in [‘?‘,s[sPtr]]):
            sPtr += 1; pPtr += 1
            continue
        if pPtr<len_p and p[pPtr] == ‘[‘:
            if pPtr in ptr2include and s[sPtr] in ptr2include[pPtr]:
                sPtr += 1
                pPtr = ptr2next[pPtr]
                continue
            if pPtr in ptr2exclude and s[sPtr] not in ptr2exclude[pPtr]:
                sPtr += 1
                pPtr = ptr2next[pPtr]
                continue
        if pPtr<len_p and p[pPtr]==‘*‘:
            star = pPtr; pPtr += 1; ss = sPtr
            continue
        if star is not None:
            pPtr = star + 1; ss += 1; sPtr = ss
            continue
        return False
    while pPtr<len(p) and p[pPtr]==‘*‘:
        pPtr += 1
    return pPtr == len_p

if __name__ == ‘__main__‘:
    params = [
            ("aa","a"),
            ("aa","aa"),
            ("aaa","aa"),
            ("aa", "*"),
            ("aa", "a*"),
            ("ab", "?*"),
            ("aab", "c*a*b"),
            ("cab", "c*a*b"),
            ("cxyzbazba", "c*ba"),
            (‘abc‘,‘ab[a-c]‘),
            (‘abd‘,‘ab[a-c]‘),
            (‘abe‘,‘ab[cde]‘),
            (‘abe‘,‘ab[!e]‘),
            (‘abe‘,‘ab[!c]‘),
        ]

    for p in params:
        print p,isMatch(*p)

执行结果是

(‘aa‘, ‘a‘) False
(‘aa‘, ‘aa‘) True
(‘aaa‘, ‘aa‘) False
(‘aa‘, ‘*‘) True
(‘aa‘, ‘a*‘) True
(‘ab‘, ‘?

*‘) True
(‘aab‘, ‘c*a*b‘) False
(‘cab‘, ‘c*a*b‘) True
(‘cxyzbazba‘, ‘c*ba‘) True
(‘abc‘, ‘ab[a-c]‘) True
(‘abd‘, ‘ab[a-c]‘) False
(‘abe‘, ‘ab[cde]‘) True
(‘abe‘, ‘ab[!e]‘) False
(‘abe‘, ‘ab[!c]‘) True




























以上是关于飘逸的python - 实现glob style pattern的主要内容,如果未能解决你的问题,请参考以下文章

飘逸的python - 实现一个pretty函数美丽的输出嵌套字典

飘逸的python - 不使用keyword,求和1+2+…+n

飘逸的python - 单例模式乱弹

飘逸的python - 装饰器的本质

飘逸的python - ord和chr以及unichr

飘逸的python - 增强的格式化字符串format函数