python re 模块和基础正则表达式

Posted Life is an Attitude

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python re 模块和基础正则表达式相关的知识,希望对你有一定的参考价值。

1.迭代器:对象在其内部实现了iter(),__iter__()方法,可以用next方法实现自我遍历。

 

.python正则表达式

1.python通过re模块支持正则表达式

2.查看当前系统有哪些python模块:help(‘modules‘)

help():交互式模式,支持两种方式调用(交互式模式调用,函数方式调用)

例:交互式调用

>>> help()

 

Welcome to Python 3.5‘s help utility!

 

If this is your first time using Python, you should definitely check out

the tutorial on the Internet at http://docs.python.org/3.5/tutorial/.

 

Enter the name of any module, keyword, or topic to get help on writing

Python programs and using Python modules.  To quit this help utility and

return to the interpreter, just type "quit".

 

To get a list of available modules, keywords, symbols, or topics, type

"modules", "keywords", "symbols", or "topics".  Each module also comes

with a one-line summary of what it does; to list the modules whose name

or summary contain a given string such as "spam", type "modules spam".

 

help> modules

 

函数式调用

help(‘modules‘)

 

3.正则表达式的元字符

. :匹配任意单个字符

[]:指定范围字符  例:[0-9]

[^]:匹配指定范围以外的字符 例:[^0-9]

?  :匹配前一个字符0次或一次

+  :匹配前一个字符一次或无限次

*  :匹配前一个字符0次或无限次

{m}:m

{m,n}:至少m次至多n

{m,}:至少m

{,n}0n

{0,}:0到无限次

 

边界匹配:

^:匹配字符串开头;在多行模式中匹配每一行的开头

$:匹配字符串结尾;在多行模式中匹配每一行的结尾

\A:仅匹配字符串开头

\Z:仅匹配字符串末尾

\b:匹配\w\W之间

\B:[^\b]

预定义字符集:

\d:数字[0-9]

\D:非数字

\s:空白字符[<空格>\t\r\n\f\v]

\S:非空白字符

\w:单词字符[A-Za-z0-9]

\W:非单词字符

 

正则表达式默认为贪婪模式

使用非贪婪模式:后面跟一个?号 例:a*? (*|+|?|{})?

 

4.调用re的内置方法完成正则表达式分析

 

5.match(匹配)对象:

match(pattern, string, flags=0)

    Try to apply the pattern at the start of the string, returning

    a match object, or None if no match was found.

 

 

m = re.match(‘a‘,‘abc‘)

 

所有:

m.end        m.group      m.lastgroup  m.re         m.start

m.endpos     m.groupdict  m.lastindex  m.regs       m.string

m.expand     m.groups     m.pos        m.span       

 

 

m.group() :打印匹配结果group是一个方法

m.groups(1)  :将所有结果返回到一个元组

m.pos (pos:postion):返回从哪个位置开始搜索

m.endpos:返回从哪个位置结束搜索

 

m.start():返回指定pattern在作匹配时所截获的子串在原串的起始位置

m.end():返回指定pattern在作匹配时所截获的子串在原串的结束位置

 

 

6.search:执行正则表达式搜索并且在搜索结束后返回所匹配到的串,只返回第一次匹配到的结果

search(pattern, string, flags=0)

    Scan through string looking for a match to the pattern, returning

    a match object, or None if no match was found.

7.findall :匹配所有的对象,返回一个列表

findall(pattern, string, flags=0)

    Return a list of all non-overlapping matches in the string.

    

    If one or more capturing groups are present in the pattern, return

    a list of groups; this will be a list of tuples if the pattern

    has more than one group.

    

    Empty matches are included in the result.

 

8.finditer(用的不多)

finditer(pattern, string, flags=0)

    Return an iterator(迭代器) over all non-overlapping matches in the

    string.  For each match, the iterator returns a match object.

    

    Empty matches are included in the result.

 

9.split

split(pattern, string, maxsplit=0, flags=0)

    Split the source string by the occurrences of the pattern,

    returning a list containing the resulting substrings.  If

    capturing parentheses are used in pattern, then the text of all

    groups in the pattern are also returned as part of the resulting

    list.  If maxsplit is nonzero, at most maxsplit splits occur,

    and the remainder of the string is returned as the final element

    of the list.

   例:a = re.split(‘\.‘,‘www.baidu.com‘)

 

10.sub:实现查找替换

sub(pattern, repl, string, count=0, flags=0)

    Return the string obtained by replacing the leftmost

    non-overlapping occurrences of the pattern in string by the

    replacement repl.  repl can be either a string or a callable;

    if a string, backslash escapes in it are processed.  If it is

    a callable, it‘s passed the match object and must return

    a replacement string to be used.

   例:In [47]: re.sub(‘baidu‘,‘BAIDU‘,‘www.baidu.com‘)

   Out[47]: ‘www.BAIDU.com‘

11.subn :查找替换,并显示替换的次数

例:

In [48]: re.subn(‘baidu‘,‘BAIDU‘,‘www.baidu.com‘)

Out[48]: (‘www.BAIDU.com‘, 1)

 

 

flags:

re.IIGNORECASE:忽略字符大小写

re.MMULTILINE:多行匹配

re.AASCII:仅执行8位的ASCII码字符匹配

re.UUNICODE:使用\w,\W

re.S (DOTALL): "." matches any character at all, including the newline.  使 . 可以匹配 \n 符。

re.X (VERBOSE): Ignore whitespace and comments for nicer looking RE‘s. 允许在正则表达式规则中加入注释,但默认会去掉所有空格。

 

12.去除优先捕获:

xxx(?:)xxx

 

?:  :分组时去除优先捕获

?P<>   :

 (?P<name>...)

Similar to regular parentheses, but the substring matched by the group is accessible via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named.

 

Named groups can be referenced in three contexts. If the pattern is (?P<quote>[‘"]).*?(?P=quote) (i.e. matching a string quoted with either single or double quotes):

 

Context of reference to group quote Ways to reference it

in the same pattern itself

(?P=quote) (as shown)

\1

when processing match object m

m.group(‘quote‘)

m.end(‘quote‘) (etc.)

in a string passed to the repl argument of re.sub()

\g<quote>

\g<1>

\1

 

以上是关于python re 模块和基础正则表达式的主要内容,如果未能解决你的问题,请参考以下文章

Python基础(13)_python模块之re模块(正则表达式)

Python基础之re模块(正则表达式)

python——RE模块的基础应用及正则表达式的使用

python基础之正则表达式和re模块

Python开发基础-Day14正则表达式和re模块

Python基础13_正则表达式,re模块,