python之re正则简单够用

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python之re正则简单够用相关的知识,希望对你有一定的参考价值。

0.

 

1.参考

Python正则表达式指南

https://docs.python.org/2/library/re.html

https://docs.python.org/3/library/re.html

string re 备注
  re.match(pattern, string, flags=0) at the start of the string
S.find(sub [,start [,end]]) -> int re.search(pattern, string, flags=0) Scan through string looking for a match
S.replace(old, new[, count]) -> string re.findall(pattern, string, flags=0) re.finditer

 

2.分组 m.group()

xx

In [560]: m.group?
Docstring:
group([group1, ...]) -> str or tuple.
Return subgroup(s) of the match by indices or names.
For 0 returns the entire match.
Type:      builtin_function_or_method

In [542]: m=re.search(r(-{1,2}(gr)),pro---gram-files)

In [543]: m.group()  #自带
Out[543]: --gr

In [544]: m.group(0)  #自带,返回整个匹配到的字符串 For 0 returns the entire match. 注意 m.string 是被检索的完整原文。。。
Out[544]: --gr

In [545]: m.group(1)
Out[545]: --gr

In [546]: m.group(2)
Out[546]: gr

In [547]: m.group(3)  #加的 ( 不满足则报错
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-547-71a2c7935517> in <module>()
----> 1 m.group(3)

IndexError: no such group

In [548]: m.group(1,2)  #选择多个分组,返回tuple
Out[548]: (--gr, gr)

In [549]: m.groups()  #选择所有分组
Out[549]: (--gr, gr)

 

m.groupdict 用于命名分组

In [557]: m.groupdict?
Docstring:
groupdict([default=None]) -> dict.
Return a dictionary containing all the named subgroups of the match,
keyed by the subgroup name. The default argument is used for groups
that did not participate in the match
Type:      builtin_function_or_method

In [558]: m=re.search(r(-{1,2}(?P<GR>gr)),pro---gram-files)

In [559]: m.groupdict()
Out[559]: {GR: gr}

 

3.提取 re.findall()

re.findall(pattern, string, flags=0)

In [97]: text = "He was carefully disguised but captured quickly by police."

In [98]: re.findall(r"\\w+ly", text)  #相当于 m.group(0)
Out[98]: [carefully, quickly]

In [99]: re.findall(r"(\\w+)ly", text)  #手动加单个括号限定内容,相当于返回 m.group(1)
Out[99]: [careful, quick]

In [100]: re.findall(r"((\\w+)(ly))", text)  #多个括号,从左到右数 (,相当于返回 m.groups()
Out[100]: [(carefully, careful, ly), (quickly, quick, ly)]

  In [102]: re.findall(r"((1\\w+)(ly))", text)
  Out[102]: []

4.替换 re.sub() 

re.sub(pattern, repl, string, count=0, flags=0)

repl 里面的 前向引用 Backreferences, such as \\6, are replaced with the substring matched by group 6 in the pattern. 也可以通过 func 实现。

In [158]: def func(m):
     ...:     return m.group(DEF)+ +m.group(2)  #别名
     ...:

In [159]: re.sub(r(?P<DEF>def)\\s+([a-z]+)\\s*\\(\\s*\\):, func, def func(): def f():)
Out[159]: def func def f

In [160]: re.sub(r(?P<DEF>def)\\s+([a-z]+)\\s*\\(\\s*\\):, r‘\\1 \\2‘, def func(): def f():)  #不支持 \\别名
Out[160]: def func def f

 

5. Backreferences 前向引用在 pattern

5.1扑克牌找对子

In [204]: re.search(r(.).*\\1,ab123)

In [205]: re.search(r(.).*\\1,ab121)
Out[205]: <_sre.SRE_Match at 0x71ca120>

In [206]: _.group()
Out[206]: 121

 

5.2连续多个相同

In [207]: re.search(r.{3},1122)  #错误
Out[207]: <_sre.SRE_Match at 0x71b94a8>

In [208]: re.search(r(.){3},1122) #错误
Out[208]: <_sre.SRE_Match at 0x71ca198>

In [209]: re.search(r(.)\\1\\1,1122) #正确

In [210]: re.search(r(.)\\1\\1,1112)
Out[210]: <_sre.SRE_Match at 0x71ca210>

In [211]: re.search(r(.)\\1{2},1112)
Out[211]: <_sre.SRE_Match at 0x71ca288>

In [212]: _.group()
Out[212]: 111

 


以上是关于python之re正则简单够用的主要内容,如果未能解决你的问题,请参考以下文章

python 正则表达式 re模块基础

python模块之re正则表达式

python模块之re正则表达式

python中的re模块

学不会的python之正则表达式详解(re模块)

学不会的python之正则表达式详解(re模块)