re模块

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了re模块相关的知识,希望对你有一定的参考价值。

1.初识re模块
姓名        地区    身高    体重    电话
况咏蜜     北京    171    48    13651054608
王心颜     上海    169    46    13813234424
马纤羽     深圳    173    50    13744234523
乔亦菲     广州    172    52    15823423525
罗梦竹     北京    175    49    18623423421
刘诺涵     北京    170    48    18623423765
岳妮妮     深圳    177    54    18835324553
贺婉萱     深圳    174    52    18933434452
叶梓萱    上海    171    49    18042432324
杜姗姗   北京    167    49       13324523342
找出上面的电话号码

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author: vita
import re
f = open(file="读文件.txt",mode="r",encoding="utf-8")
data = f.read()
print(re.findall(‘[0-9]{11}‘,data))

E:\PythonProject\python-test\venvP3\Scripts\python.exe E:/PythonProject/python-test/BasicGrammer/test.py
[‘13744234523‘, ‘15823423525‘, ‘18623423421‘, ‘18623423765‘, ‘18835324553‘, ‘18933434452‘, ‘18042432324‘, ‘13324523342‘, ‘13542342233‘]

Process finished with exit code 0

2.re的匹配语法

re.match 从头开始匹配
re.search 匹配包含
re.findall 把所有匹配到的字符放到以列表中的元素返回
re.split 以匹配到的字符当做列表分隔符
re.sub 匹配字符并替换
re.fullmatch 全部匹配
prog = re.compile(pattern)
result = prog.match(string)
上面的步骤相当于
result = re.match(pattern, string)

pattern 正则表达式
string 要匹配的字符串
flags 标志位,用于控制正则表达式的匹配方式

2.1flags标志符

re.I(re.IGNORECASE): 忽略大小写(括号内是完整写法,下同)
M(MULTILINE): 多行模式,改变‘^‘和‘$‘的行为
S(DOTALL): 改变‘.‘的行为,make the ‘.‘ special character match any character at all, including a newline; without this flag, ‘.‘ will match anything except a newline.
X(re.VERBOSE) 可以给你的表达式写注释,使其更可读,下面这2个意思一样
a = re.compile(r"""\d + # the integral part
                \. # the decimal point
                \d * # some fractional digits""", 
                re.X)

b = re.compile(r"\d+\.\d*")

2.2常用的表达式规则

‘.‘     默认匹配除\n之外的任意一个字符,若指定flag DOTALL,则匹配任意字符,包括换行
‘^‘     匹配字符开头,若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE)
‘$‘     匹配字符结尾, 若指定flags MULTILINE ,re.search(‘foo.$‘,‘foo1\nfoo2\n‘,re.MULTILINE).group() 会匹配到foo1
‘*‘     匹配*号前的字符0次或多次, re.search(‘a*‘,‘aaaabac‘)  结果‘aaaa‘
‘+‘     匹配前一个字符1次或多次,re.findall("ab+","ab+cd+abb+bba") 结果[‘ab‘, ‘abb‘]
‘?‘     匹配前一个字符1次或0次 ,re.search(‘b?‘,‘alex‘).group() 匹配b 0次
‘{m}‘   匹配前一个字符m次 ,re.search(‘b{3}‘,‘alexbbbs‘).group()  匹配到‘bbb‘
‘{n,m}‘ 匹配前一个字符n到m次,re.findall("ab{1,3}","abb abc abbcbbb") 结果‘abb‘, ‘ab‘, ‘abb‘]
‘|‘     匹配|左或|右的字符,re.search("abc|ABC","ABCBabcCD").group() 结果‘ABC‘
‘(...)‘ 分组匹配, re.search("(abc){2}a(123|45)", "abcabca456c").group() 结果为‘abcabca45‘

‘\A‘    只从字符开头匹配,re.search("\Aabc","alexabc") 是匹配不到的,相当于re.match(‘abc‘,"alexabc") 或^
‘\Z‘    匹配字符结尾,同$ 
‘\d‘    匹配数字0-9
‘\D‘    匹配非数字
‘\w‘    匹配[A-Za-z0-9]
‘\W‘    匹配非[A-Za-z0-9]
‘s‘     匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 结果 ‘\t‘

‘(?P<name>...)‘ 分组匹配 re.search("(?P<province>[0-9]{4})(?P<city>[0-9]{2})(?P<birthday>[0-9]{4})","371481199306143242").groupdict("city") 结果{‘province‘: ‘3714‘, ‘city‘: ‘81‘, ‘birthday‘: ‘1993‘}

2.2.1验证

‘.‘     默认匹配除\n之外的任意一个字符,若指定flag DOTALL,则匹配任意字符,包括换行
#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author: vita
import re
print(re.match(".+","\nabc"))
print(re.match(".+","\nabc",flags=re.DOTALL))

E:\PythonProject\python-test\venvP3\Scripts\python.exe E:/PythonProject/python-test/BasicGrammer/test.py
None
<_sre.SRE_Match object; span=(0, 4), match=‘\nabc‘>

Process finished with exit code 0

‘^‘     匹配字符开头,若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE)

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author: vita
import re
print(re.search("^a","\nabc"))
print(re.search("^a","\nabc",flags=re.MULTILINE))

E:\PythonProject\python-test\venvP3\Scripts\python.exe E:/PythonProject/python-test/BasicGrammer/test.py
None
<_sre.SRE_Match object; span=(1, 2), match=‘a‘>

Process finished with exit code 0

‘$‘     匹配字符结尾, 若指定flags MULTILINE ,re.search(‘foo.$‘,‘foo1\nfoo2\n‘,re.MULTILINE).group() 会匹配到foo1

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author: vita
import re
print(re.search("foo.$","\nabc\foo1\nfoo2"))
print(re.search("foo.$","\nabc\nfoo1\nfoo2",flags=re.MULTILINE))

E:\PythonProject\python-test\venvP3\Scripts\python.exe E:/PythonProject/python-test/BasicGrammer/test.py
<_sre.SRE_Match object; span=(9, 13), match=‘foo2‘>
<_sre.SRE_Match object; span=(5, 9), match=‘foo1‘>

Process finished with exit code 0

2.3re.match(pattern, string, flags=0)

从起始位置开始根据模型去字符串中匹配指定内容,匹配单个
pattern 正则表达式
string 要匹配的字符串
flags 标志位,用于控制正则表达式的匹配方式

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author: vita
import re
match_obj = re.match(‘u\d+‘,‘u123uu888asf‘)
print("re.match",match_obj)

match_obj = re.match(‘\d+‘,‘u123uu888asf‘)
print("re.match",match_obj)

match_obj = re.match(‘\d+‘,‘123uuasf234‘)
print("re.match",match_obj)

E:\PythonProject\python-test\venvP3\Scripts\python.exe E:/PythonProject/python-test/BasicGrammer/test.py
re.match <_sre.SRE_Match object; span=(0, 4), match=‘u123‘>
re.match None
re.match <_sre.SRE_Match object; span=(0, 3), match=‘123‘>

Process finished with exit code 0

2.4re.search(pattern, string, flags=0)

根据模型去字符串中匹配指定内容,匹配单个

#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author: vita
import re
search_obj = re.search(‘\d+‘,‘u123uu888asf‘)
print("re.search",search_obj)
print("re.search",search_obj.group())

E:\PythonProject\python-test\venvP3\Scripts\python.exe E:/PythonProject/python-test/BasicGrammer/test.py
re.search <_sre.SRE_Match object; span=(1, 4), match=‘123‘>
re.search 123

Process finished with exit code 0

2.5re.findall(pattern, string, flags=0)

match and search均用于匹配单值,即:只能匹配字符串中的一个,如果想要匹配到字符串中所有符合条件的元素,则需要使用 findall。
#!/usr/bin/env python
# -*- coding:utf-8 -*-
# Author: vita
import re
obj = re.findall(‘\d+‘, ‘fa123uu888asf‘)
print(obj)

E:\PythonProject\python-test\venvP3\Scripts\python.exe E:/PythonProject/python-test/BasicGrammer/test.py
[‘123‘, ‘888‘]

Process finished with exit code 0

2.6re.sub(pattern, repl, string, count=0, flags=0)

>>> re.sub(‘abc‘,‘supergirl‘,‘vita is abc123‘,)
‘vita is supergirl123‘
>>> re.sub(‘\d+‘,‘supergirl‘,‘丽丽abc123‘,)
‘丽丽abcsupergirl‘
>>> re.sub(‘[a-z]+‘,‘supergirl‘,‘丽丽abc123‘,)
‘丽丽supergirl123‘
>>>
>>> re.sub(‘\d+‘,‘|‘, ‘vita22lili33lyly55‘,count=2)
‘vita|lili|lyly55‘

2.7re.split(pattern, string, maxsplit=0, flags=0)

>>>s=‘9-2*5/3+7/3*99/4*2998+10*568/14‘
>>>re.split(‘[\*\-\/\+]‘,s)
[‘9‘, ‘2‘, ‘5‘, ‘3‘, ‘7‘, ‘3‘, ‘99‘, ‘4‘, ‘2998‘, ‘10‘, ‘568‘, ‘14‘]

>>> re.split(‘[\*\-\/\+]‘,s,3)
[‘9‘, ‘2‘, ‘5‘, ‘3+7/3*99/4*2998+10*568/14‘]

2.8re.fullmatch(pattern, string, flags=0)

整个字符串匹配成功就返回re object, 否则返回None
>>> re.fullmatch(‘\[email protected]\w+\.(com|cn|edu)‘,"[email protected]")
<_sre.SRE_Match object; span=(0, 17), match=‘[email protected]‘>
>>> re.fullmatch(‘\[email protected]\w+\.‘,"[email protected]")
>>>

以上是关于re模块的主要内容,如果未能解决你的问题,请参考以下文章

python 正则表达式 re模块基础

如何使用模块化代码片段中的LeakCanary检测内存泄漏?

Python基础之re模块

如何有条件地将 C 代码片段编译到我的 Perl 模块?

python re模块findall()详解

Node.js JavaScript 片段中的跳过代码