正则表达式 re包 2018-10-02

Posted qiulinzhang

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了正则表达式 re包 2018-10-02相关的知识,希望对你有一定的参考价值。

参考官网:Regular expression operations

re: regular expression, 简写:regex

  1. 正则表达式规则:版本:v2.3.5 (2017-6-12) 作者:deerchao; http://deerchao.net/tutorials/regex/regex.htm
    -------------------------------------------------------------------------------------
  2. 正则表达式的功能:正则表达式(regular expression)主要功能是从字符串(string)中通过特定的模式(pattern),搜索想要找到的内容。
    -------------------------------------------------------------------------------------

re常用函数:

  • re.compile(pattern, flags)
    将一个正则表达式的pattern 转化成一个正则表达式对象
    Compile a regular expression pattern into a regular expression object, which can be used for matching using its match()search() and other methods, described below.
prog = re.compile(pattern)
result = prog.match(string)

is equivalent to

result = re.match(pattern, string)

-------------------------------------------------------------------------------------

  • re.search(pattern, string, flags = 0)
    在 string 中找到 pattern 第一次出现的地方
    Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.
    -------------------------------------------------------------------------------------
  • re.match(pattern, string, flags = 0)
    在字符串 string 的句首进行匹配 pattern,不能像search()任意匹配
    if zero or more characters at the beginning of string match the regular expression pattern, return a corresponding match object. Return None if the string does not match the pattern; note that this is different from a zero-length match.
    Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.
    If you want to locate a match anywhere in string, use search() instead (see also search() vs. match()).
    -------------------------------------------------------------------------------------
  • re.split(pattern, string, flags = 0)
    以在 string 中匹配到的pattern为界对 string 进行分割,如果 pattern使用了括号,那么找到的pattern也一起返回;
    如下所示,‘W+‘ 匹配1个或者多个任意不是字母、数字、下划线的字符,则匹配到了逗号,以及后面的空格,因此以逗号和空格为界进行分割;第二个例子加了括号,则将匹配到的逗号和空格也进行返回。
>>> re.split(r‘W+‘, ‘Words, words, words.‘)
[‘Words‘, ‘words‘, ‘words‘, ‘‘]
>>> re.split(r‘(W+)‘, ‘Words, words, words.‘)
[‘Words‘, ‘, ‘, ‘words‘, ‘, ‘, ‘words‘, ‘.‘, ‘‘]
>>> re.split(r‘W+‘, ‘Words, words, words.‘, 1)
[‘Words‘, ‘words, words.‘]
>>> re.split(‘[a-f]+‘, ‘0a3B9‘, flags=re.IGNORECASE)
[‘0‘, ‘3‘, ‘9‘]

If there are capturing groups in the separator and it matches at the start of the string, the result will start with an empty string. The same holds for the end of the string:如果在字符串头部或者是字符串尾部匹配到,则会增加返回一个空字符串

>>> re.split(r‘(W+)‘, ‘...words, words...‘)
[‘‘, ‘...‘, ‘words‘, ‘, ‘, ‘words‘, ‘...‘, ‘‘]

-------------------------------------------------------------------------------------

  • re.sub(pattern,repl, string, count = 0, flags = 0)
    用 repl 去无重叠地覆盖 pattern 在 string中匹配的字符:
    Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function; if it is a string, any backslash escapes in it are processed. That is, is converted to a single newline character, is converted to a carriage return, and so forth. Unknown escapes such as & are left alone. Backreferences, such as 6, are replaced with the substring matched by group 6 in the pattern. For example:
>>> re.sub(r‘defs+([a-zA-Z_][a-zA-Z_0-9]*)s*(s*):‘,
...        r‘static PyObject*
py_1(void)
{‘,
...        ‘def myfunc():‘)
‘static PyObject*
py_myfunc(void)
{‘

这里 def myfunction(): 都被匹配到了,但是([a-zA-Z_][a-zA-Z_0-9]*)加了括号,所以这里面匹配到的 myfunc 视为群组1,然后用 repl 对匹配好的内容进行无重叠地覆盖,由于 string 全部被匹配,因此全部被覆盖,然后再把群组1 往代码中的1 处替代。

当 repl 是一个函数时:
If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string. For example:

>>> def dashrepl(matchobj):
...     if matchobj.group(0) == ‘-‘: return ‘ ‘
...     else: return ‘-‘
>>> re.sub(‘-{1,2}‘, dashrepl, ‘pro----gram-files‘)
‘pro--gram files‘
>>> re.sub(r‘sANDs‘, ‘ & ‘, ‘Baked Beans And Spam‘, flags=re.IGNORECASE)
‘Baked Beans & Spam‘

















以上是关于正则表达式 re包 2018-10-02的主要内容,如果未能解决你的问题,请参考以下文章

python基础学习笔记(十三)

Python标准库01 正则表达式 (re包)

python 正则表达式 re模块基础

常用模块之re模块以及正则表达式扩展

python中的re模块

正则表达式