飘逸的python - 实现glob style pattern
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了飘逸的python - 实现glob style pattern相关的知识,希望对你有一定的参考价值。
一说起通配符,大家非常快就会想起*和?
实现这个东西事实上挺简单的,从左往右扫描s串和p串,假设最后都走到了结尾,那么就是能够匹配的.
主要难点在于*号的匹配.由于*号能够匹配0个或者多个,所以须要试探回溯.这里通过保存*号位置,假设后面的走不通了,就拉回*号位置,贪婪匹配.
执行结果是
号,有了通配符,使得表达能力大大增强,非常多linux命令都支持这个东西,事实上就是glob style pattern.
就连redis的keys命令都支持glob.
我要实现的glob,支持下面特性:
- 星号*匹配0个或多个随意字符
- ?
匹配确切的一个随意字符
- [characters]匹配随意一个方括号内的字符,比方[abc],要么匹配a,要么匹配b,要么匹配c.
- [!character]排除方括号内的字符
- [character-character],表示2个字符范围内的都能够匹配,如[a-z],[0-9]
实现这个东西事实上挺简单的,从左往右扫描s串和p串,假设最后都走到了结尾,那么就是能够匹配的.
主要难点在于*号的匹配.由于*号能够匹配0个或者多个,所以须要试探回溯.这里通过保存*号位置,假设后面的走不通了,就拉回*号位置,贪婪匹配.
至于方括号的展开,弄个include和exclude变量就非常清晰了.
以下上代码.
#coding=utf-8 def build_expand(p):#方括号展开 ptr2include = {} ptr2exclude = {} ptr2next = {} len_p = len(p) pPtr = 0 while pPtr<len_p: if p[pPtr] == ‘[‘: start = pPtr pPtr += 1 include = set([]) exclude = set([]) while p[pPtr]!=‘]‘: if p[pPtr]==‘!‘: exclude.add(p[pPtr+1]) pPtr += 2 elif p[pPtr+1] == ‘-‘: include.update({chr(x) for x in range(ord(p[pPtr]),ord(p[pPtr+2])+1)}) pPtr += 3 else: include.add(p[pPtr]) pPtr += 1 if include: ptr2include[start] = include if exclude: ptr2exclude[start] = exclude ptr2next[start] = pPtr + 1 else: pPtr += 1 return ptr2include, ptr2exclude, ptr2next def isMatch(s, p): len_s = len(s); len_p = len(p) sPtr = pPtr = ss = 0 star = None ptr2include, ptr2exclude, ptr2next = build_expand(p) while sPtr<len_s: if pPtr<len_p and (p[pPtr] in [‘?‘,s[sPtr]]): sPtr += 1; pPtr += 1 continue if pPtr<len_p and p[pPtr] == ‘[‘: if pPtr in ptr2include and s[sPtr] in ptr2include[pPtr]: sPtr += 1 pPtr = ptr2next[pPtr] continue if pPtr in ptr2exclude and s[sPtr] not in ptr2exclude[pPtr]: sPtr += 1 pPtr = ptr2next[pPtr] continue if pPtr<len_p and p[pPtr]==‘*‘: star = pPtr; pPtr += 1; ss = sPtr continue if star is not None: pPtr = star + 1; ss += 1; sPtr = ss continue return False while pPtr<len(p) and p[pPtr]==‘*‘: pPtr += 1 return pPtr == len_p if __name__ == ‘__main__‘: params = [ ("aa","a"), ("aa","aa"), ("aaa","aa"), ("aa", "*"), ("aa", "a*"), ("ab", "?*"), ("aab", "c*a*b"), ("cab", "c*a*b"), ("cxyzbazba", "c*ba"), (‘abc‘,‘ab[a-c]‘), (‘abd‘,‘ab[a-c]‘), (‘abe‘,‘ab[cde]‘), (‘abe‘,‘ab[!e]‘), (‘abe‘,‘ab[!c]‘), ] for p in params: print p,isMatch(*p)
执行结果是
(‘aa‘, ‘a‘) False
(‘aa‘, ‘aa‘) True
(‘aaa‘, ‘aa‘) False
(‘aa‘, ‘*‘) True
(‘aa‘, ‘a*‘) True
(‘ab‘, ‘?
*‘) True
(‘aab‘, ‘c*a*b‘) False
(‘cab‘, ‘c*a*b‘) True
(‘cxyzbazba‘, ‘c*ba‘) True
(‘abc‘, ‘ab[a-c]‘) True
(‘abd‘, ‘ab[a-c]‘) False
(‘abe‘, ‘ab[cde]‘) True
(‘abe‘, ‘ab[!e]‘) False
(‘abe‘, ‘ab[!c]‘) True
以上是关于飘逸的python - 实现glob style pattern的主要内容,如果未能解决你的问题,请参考以下文章
飘逸的python - 实现一个pretty函数美丽的输出嵌套字典