Python正则表达式

Posted Rolei_zl

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python正则表达式相关的知识,希望对你有一定的参考价值。

    很少用正则表达式(为什么?不知道,但有用),每次用到总是要重新查,然后重新试验后使用。
    整理Python正则表达式帮助内容,学习和理解,蓝色​标记及空白有待完善和补充。​​​​​​

  1. 函数
  2. 特殊字符
  3. 特殊转译 
This module exports the following functions 函数s = 'abca'
标识英文描述中文描述用法示例
matchMatch a regular expression pattern to the beginning of a string.从字符串开始(第一个字符)匹配模式串match(pattern, string, flags=0)

re.match('a',s)
>>> <re.Match object; span=(0, 1), match='a'>
re.match('a',s)[0]
>>> a
re.match('ab',s)
>>> <re.Match object; span=(0, 2), match='ab'>
re.match('c',s)
>>> None

fullmatchMatch a regular expression pattern to all of a string.匹配字符串与模式串是否一致fullmatch(pattern, string, flags=0)re.fullmatch('abca',s)
>>> <re.Match object; span=(0, 3), match='abc'>
re.fullmatch('abc',s)
>>> None
searchSearch a string for the presence of a pattern.在字符串中匹配模式串search(pattern, string, flags=0)re.search('a',s)
>>> <re.Match object; span=(0, 1), match='a'>
re.search('bc',s)
>>> <re.Match object; span=(1, 3), match='bc'>
re.search('d',s)
>>> None
subSubstitute occurrences of a pattern found in a string.替换字符串中匹配的模式串,默认全部
count设定匹配替换的个数
sub(pattern, repl, string, count=0, flags=0)re.sub('a','A',s)
>>> AbcA
re.sub('a','A',s,1)
>>> Abca
re.sub('abc','ABC',s)
>>> ABCa
re.sub('d','A',s)
>>> abca   
subnSame as sub, but also return the number of substitutions made.sub,同时返回模式串出现的次数,返回元组subn(pattern, repl, string, count=0, flags=0)re.subn('a','A',s)
>>> ('AbcA', 2)
re.subn('a','A',s)[1]
>>> 2
re.subn('d','A',s)
>>> ('abca', 0)
splitSplit a string by the occurrences of a pattern.按模式拆分字符串,匹配的模式串显示为空,返回列表
maxsplit设定匹配和拆分的个数
split(pattern, string, maxsplit=0, flags=0)re.split('a',s)
>>> ['', 'bc', '']
re.split('a',s,1)
>>> ['', 'bca']
findallFind all occurrences of a pattern in a string.按模式返回字符串中所有匹配findall(pattern, string, flags=0)re.findall('a',s)
>>> ['a', 'a']
finditerReturn an iterator yielding a Match object for each match.findall,返回match对象listfinditer(pattern, string, flags=0)re.finditer('a',s)
>>> <callable_iterator object at 0x00000253AD29A9A0>
list(re.finditer('a',s))
>>> [<re.Match object; span=(0, 1), match='a'>, <re.Match object; span=(3, 4), match='a'>]
compileCompile a pattern into a Pattern object.创建模式串对象compile(pattern, flags=0)mo = re.compile('a')
re.findall(mo,s)
>>> ['a', 'a']
purgeClear the regular expression cache.清空正则表达式缓存purge()具体意义不明确,待验证
escapeBackslash all non-alphanumerics in a string.转译,所有非字母数字添加反斜杠escape(pattern)re.escape('\\\\')
>>> '\\\\\\\\'
re.escape('a')
>>> 'a'
re.escape('1')
>>> '1'
re.escape('#')
>>> '\\\\#'
The special characters are,特殊字符s = 'abcdaabc \\n reg'
标识英文描述中文描述用法示例
.Matches any character except a newline.匹配除\\n换行字符外所有字符串,返回listre.findall('.',s)
>>> ['a', 'b', 'c', 'd', 'a', 'a', 'b', 'c', ' ', ' ', 'r', 'e', 'g']
^Matches the start of the string.从字符串开始匹配re.findall('^a',s)
>>> ['a']
re.findall('^b',s)
>>> []
$Matches the end of the string or just before the newline at the end of the string.以字符串结束开始匹配re.findall('g$',s)
>>> ['g']
re.findall('a$',s)
>>> []
*Matches 0 or more (greedy) repetitions of the preceding RE. 匹配0或更多重复字符(*前一字符或字符段)re.findall('a*',s)   # 匹配包含非a 1个a 或 多个a的子字符串
>>> ['a', '', '', '', 'aa', '', '', '', '', '', '', '', '', '']  
re.findall('abcc*',s)
>>> ['abc', 'abc'] 
+Matches 1 or more (greedy) repetitions of the preceding RE.匹配1或更多重复字符(+前一字符或字符段)re.findall('a+',s)   # 匹配包含a 多个a的子字符串
>>> ['a', 'aa']
re.findall('abcc+',s)
>>> []
?Matches 0 or 1 (greedy) of the preceding RE.匹配01个字符(?前一字符或字符段)re.findall('a?',s)   # 匹配1a a的子字符串
>>> ['a', '', '', '', 'a', 'a', '', '', '', '', '', '', '', '', '']
re.findall('abcc?',s)
>>> ['abc', 'abc']
*?,+?,??Non-greedy versions of the previous three special characters.以?结束贪婪匹配(只匹配最少字符)re.findall('a*?b',s)
>>> ['ab', 'aab']
re.findall('a+?b',s)
>>> ['ab', 'aab']
re.findall('a??b',s)
>>> ['ab', 'ab']
m,nMatches from m to n repetitions of the preceding RE.匹配一个字符mn次重复(前一字符或字符段)re.findall('abc0,0',s)
>>> ['ab', 'ab']
re.findall('abc0,1',s)
>>> ['abc', 'abc']
m,n?Non-greedy version of the above.以?结束贪婪(只匹配最少字符),匹配指定长度re.findall('abc0,1?',s)
>>> ['ab', 'ab']
re.findall('abc0,1?d',s)
>>>['abcd']
\\\\Either escapes special characters or signals a special sequence.匹配特殊字符或特殊序列re.findall('\\\\n',s)
>>> ['\\n']
[]Indicates a set of characters.A "^" as the first character indicates a complementing set.将[]中的字符串中的每个字符分别匹配,不包含特殊字符
第一个字符^ 非,不匹配^后的每个字符的模式
re.findall('[a*b\\n]',s)
>>> ['a', 'b', 'a', 'a', 'b', '\\n']
re.findall('[^a*b\\n]',s)
>>> ['c', 'd', 'c', ' ', ' ', 'r', 'e', 'g']
|A|B, creates an RE that will match either A or B.or,匹配多种不同模式串re.findall('a*?b|\\\\n',s)
>>> ['ab', 'aab', '\\n']
(...)Matches the RE inside the parentheses.The contents can be retrieved or matched later in the string.按()中的模式串进行匹配并分组,不在()的不进行匹配re.findall('(a*)(bc+)(d?)',s)
(?aiLmsux)The letters set the corresponding flags defined below.  
(?:...)Non-grouping version of regular parentheses.  
(?P<name>...)The substring matched by the group is accessible by name.通过名称访问组匹配模式串re.search('(?P<TEST>.*)',s).groupdict()
>>> 'TEST': 'abcdaabc '
(?P=name)Matches the text matched earlier by the group named name.按组名匹配第一个文本re.search('(?P<abc>.*)(?P=abc)',s).groupdict()
>>> 'abc': ''
(?#...)A comment; ignored.注释 
(?=...)Matches if ... matches next, but doesn't consume the string.  
(?!...)Matches if ... doesn't match next.  
(?<=...)Matches if preceded by ... (must be fixed length).  
(?<!...)Matches if not preceded by ... (must be fixed length).  
(?(id/name)yes|no)Matches yes pattern if the group with id/name matched,the (optional) no pattern otherwise.  
The special sequences consist of "\\\\" and a character from the list below.  If the ordinary character is not on the list, then the resulting RE will match the second character.s = ''
标识英文描述中文描述用法示例
\\numberMatches the contents of the group of the same number.匹配第数字组内容 
\\AMatches only at the start of the string.匹配以模式开始的字符串,^ 
\\ZMatches only at the end of the string.匹配以模式结束的字符串,$ 
\\bMatches the empty string, but only at the start or end of a word.匹配单词边界(单词间空格) 
\\BMatches the empty string, but not at the start or end of a word.匹配非单词边界(除空格外字符) 
\\dMatches any decimal digit; equivalent to the set [0-9] in bytes patterns or string patterns with the ASCII flag. In string patterns without the ASCII flag, it will match the whole range of Unicode digits.匹配任意数字,等价于 [0-9] 
\\DMatches any non-digit character; equivalent to [^\\d].匹配任意非数字 
\\sMatches any whitespace character; equivalent to [ \\t\\n\\r\\f\\v] in bytes patterns or string patterns with the ASCII flag. In string patterns without the ASCII flag, it will match the whole range of Unicode whitespace characters.匹配任意空白字符,等价于 [ \\t\\n\\r\\f] 
\\SMatches any non-whitespace character; equivalent to [^\\s].匹配任意非空字符 
\\wMatches any alphanumeric character; equivalent to [a-zA-Z0-9_]in bytes patterns or string patterns with the ASCII flag.In string patterns without the ASCII flag, it will match the range of Unicode alphanumeric characters (letters plus digits plus underscore).With LOCALE, it will match the set [0-9_] plus characters defined as letters for the current locale.匹配字母数字及下划线 
\\WMatches the complement of \\w.匹配非字母数字及下划线 
\\\\Matches a literal backslash.匹配反斜杠 
说明 
Greedy means that it will match as many repetitions as possible.贪婪意味着匹配任意多个可能的重复
pattern模式串,要匹配的正则表达式
flag标志位,控制正则表达式的匹配方式1. re.I(re.IGNORECASE): 忽略大小写
2. re.M(MULTILINE): 多行模式,改变'^''$'的行为
3. re.S(DOTALL): 点任意匹配模式,改变'.'的行为
4. re.L(LOCALE): 使预定字符类 \\w \\W \\b \\B \\s \\S 取决于当前区域设定
5. re.U(UNICODE): 使预定字符类 \\w \\W \\b \\B \\s \\S \\d \\D 取决于unicode定义的字符属性
6. re.X(VERBOSE): 详细模式。这个模式下正则表达式可以是多行,忽略空白字符,并可以加入注释

实例

  • 按组匹配身份证

参考:

以上是关于Python正则表达式的主要内容,如果未能解决你的问题,请参考以下文章

python模块之re正则表达式

详解 Python3 正则表达式

python正则表达式

更强大的python正则表达式模块 -- regex

python re正则

python:正则表达式