python与正则

Posted 2020-12-27 大大的大笨熊

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python与正则相关的知识，希望对你有一定的参考价值。

想了解正则的使用，请点击：正则表达式。每种编程语言有一些独特的匹配方式，python也不例外：

语法	含义	表达实例	完整匹配匹配的字符串
\\A	仅匹配字符串开头	\\Aabc	abc
\\Z	仅匹配字符串末尾	abc\\Z	abc
(?P)	分组，除了原有编号再指定一个额外的别名	(?Pabc).{2}	abcabc
(?P=name)	引用别名为的分组匹配到字符串	(?P\\d)abc(?P=id)	1abc1\\n5abc5

在python语句中要匹配字符\\，需要在表达式中写\\\\\\\\，因为python编译需要\\\\表示\\，同时正则也是。或者使用python原生字符串的支持，匹配一个\\的正则表达式可以写成r\'\\\\\'，同样，匹配一个数字的\'\\\\d\'可以写成r\'\\d\'，

python通过模块re提供正则表达式的支持。使用re第一步先将正则表达式的字符串形式编译为Pattern，然后使用Pattern实例处理文本并获得匹配结果，最后使用Match实例获得信息，进行其他操作。

主要用到方法如下：

re.compile(string[,flag])
re.match(pattern,string[,flags])
re.search(pattern,string[,flags])
re.split(pattern,string[,flags])
re.findall(pattern,string[,flags])\\
re.finditer(pattern,string[,flags])\\
re.sub(pattern,repl,string[,flags])
re.subn(pattern,repl,string[,flags])

pattern = re.compile(r\'\\d+\')
flag 参数是代表匹配模式，取值可以使用 | 同时生效，取值如下：
- re.I :忽略大小写
- re.M: 多行模式，改变“^”和"$"的行为
- re.S 任意匹配模式，改变“.”的行为
- re.L 使预定字符类\\w\\W\\b\\B\\s\\S取决于当前区域的设定
- re.U 使预定字符类\\w\\W\\b\\B\\s\\S\\d\\D取决于Unicode定义的字符属性
- re.X 详细模式。这个模式下正则表达式可以是多行，忽略空白字符，并加入注释

1 re.match(pattern ,string[,flags])

这个函数从输入参数string（匹配的字符串）的开头开始匹配，尝试匹配pattern，一直向后匹配，如果遇到无法匹配的字符或者已经到达string的末尾，立即返回None

#!coding:utf-8
import re
pattern = re.compile(r\'\\d+\')

result1 = re.match(pattern,\'192abc\')
if result1:
    print result1.group()
else:
    print \'匹配失败\'
result2 = re.match(pattern,\'abc123\')
if result2:
    print result1.group()
else:
    print \'匹配失败\'

2. re.search(pattern,string[,flags])

search方法与match方法极其类似，区别在于match函数只从string的开始位置匹配，而search会扫描整个string查找匹配，match()只有在string起始位置匹配成功的时候才会有返回，如果不是开始位置匹配成功的话，返回none。search返回对象和match返回的对象在方法和属性上一致

#!coding:utf-8
import re
pattern = re.compile(r\'\\d+\')

result1 = re.search(pattern,\'abc192abc\')
if result1:
    print result1.group()
else:
    print \'匹配失败\'
result2 = re.search(pattern,\'123abc123\')
if result2:
    print result1.group()
else:
    print \'匹配失败\'


运行结果
C:\\Python27\\python.exe F:/python_scrapy/ch04/4.2.2.1.py
192
192

Process finished with exit code 0

3 re.split(pattern,string[,flags])

按照能够匹配的字符串将string分割后返回列表。maxsplit用于指定最大分割次数，不指定，则全部分割。

#!coding:utf-8
import re
pattern = re.compile(r\'\\d+\')
print re.split(pattern,\'A1B2C2De2\')

运行结果
C:\\Python27\\python.exe F:/python_scrapy/ch04/4.2.2.3.py
[\'A\', \'B\', \'C\', \'De\', \'\']

Process finished with exit code 0

4 re.findall(pattern,string[,flags])

搜索整个string，以列表形式返回能匹配的全部字符串，

#!coding:utf-8
import re
pattern = re.compile(r\'\\d+\')
print re.findall(pattern,\'A1B2C2De2\')

运行结果
C:\\Python27\\python.exe F:/python_scrapy/ch04/4.2.2.3.py
[\'1\', \'2\', \'2\', \'2\']

Process finished with exit code 0

5 re.finditer(patttern,string[,flags])

搜索整个string，以迭代器形式返回能匹配全部Match对象，

#!coding:utf-8
import re
pattern = re.compile(r\'\\d+\')
matchiter = re.finditer(pattern,\'A1B2C2De2\')
for match in matchiter:
    print match.group()

运行结果
C:\\Python27\\python.exe F:/python_scrapy/ch04/4.2.2.3.py
1
2
2
2

Process finished with exit code 0

6 re.sub(pattern,repl,string[,flags])

使用reply替换string中每一个匹配的字符串后返回替换后的字符串。当repl是一个字符串时，可以使用\\id或\\g、\\g引用分组，但不能使用编号0，当repl是一个方法时，这个方法应当只接受一个参数(Match对象)，并返回一个字符串用于替换(返回的字符串中不能再引用分组）。count用于指定最多替换次数，不指定是全部替换

#!coding:utf-8
import re
p = re.compile(r\'(?P<word1>\\w+) (?P<word2>\\w+)\')#使用名称引用
s = \'i say,hello world!\'
print p.sub(r\'\\g<word2> \\g<word1>\',s)#repl是一个字符串时，使用名字引用
p = re.compile(r\'(\\w+) (\\w+)\')#使用编号
print p.sub(r\'\\2 \\1\',s)#
def func(m):
    return m.group(1).title()+\' \'+m.group(2).title()
print p.sub(func,s)#repl是一个方法时

运行结果
C:\\Python27\\python.exe F:/python_scrapy/ch04/4.2.2.6.py
say i,world hello!
say i,world hello!
I Say,Hello World!

Process finished with exit code 0

7 re.subn(pattern,repl,string[,flags])

返回（sub(pattern,repl,string[,flags])）替换的次数

#!coding:utf-8
import re
p = re.compile(r\'(?P<word1>\\w+) (?P<word2>\\w+)\')#使用名称引用
s = \'i say,hello world!\'
print p.subn(r\'\\g<word2> \\g<word1>\',s)#repl是一个字符串时，使用名字引用
p = re.compile(r\'(\\w+) (\\w+)\')#使用编号
print p.subn(r\'\\2 \\1\',s)#
def func(m):
    return m.group(1).title()+\' \'+m.group(2).title()
print p.subn(func,s)#repl是一个方法时


运行结果
C:\\Python27\\python.exe F:/python_scrapy/ch04/4.2.2.6.py
(\'say i,world hello!\', 2)
(\'say i,world hello!\', 2)
(\'I Say,Hello World!\', 2)

Process finished with exit code 0

Match对象的属性

属性和方法	说明
Pos	搜索的开始位置
Endpos	搜索的结束位置
String	搜索的字符串
Re	当前使用的正则表达式的对象
Lastindex	最后匹配的组索引
Lastgroup	最后匹配的组名
group(index=0)	某个分组的匹配结果。如果index等于0，便是匹配整个正则表达式
groups()	所有分组的匹配结果，每个分组的结果组成一个列表返回
Groupdict()	返回组名作为key，每个分组的匹配结果座位value的字典
start([group])	获取组的开始位置
end([group])	获取组的结束位置
span([group])	获取组的开始和结束位置
expand(template)	使用组的匹配结果来替换模板template中的内容，并把替换后的字符串返回


import re
pattern = re.compile(r\'(\\w+) (\\w+) (?P<word>.*)\')
match = pattern.match( \'I love you!\')
print "match.string:", match.string
print "match.re:", match.re
print "match.pos:", match.pos
print "match.endpos:", match.endpos
print "match.lastindex:", match.lastindex
print "match.lastgroup:", match.lastgroup
print "match.group(1,2):", match.group(1, 2)
print "match.groups():", match.groups()
print "match.groupdict():", match.groupdict()
print "match.start(2):", match.start(2)
print "match.end(2):", match.end(2)
print "match.span(2):", match.span(2)
print r"match.expand(r\'\\2 \\1 \\3\'):", match.expand(r\'\\2 \\1 \\3\')

运行结果
C:\\Python27\\python.exe F:/python_scrapy/ch04/4.2.2.7.py
match.string: I love you!
match.re: <_sre.SRE_Pattern object at 0x020F7890>
match.pos: 0
match.endpos: 11
match.lastindex: 3
match.lastgroup: word
match.group(1,2): (\'I\', \'love\')
match.groups(): (\'I\', \'love\', \'you!\')
match.groupdict(): {\'word\': \'you!\'}
match.start(2): 2
match.end(2): 6
match.span(2): (2, 6)
match.expand(r\'\\2 \\1 \\3\'): love I you!

Process finished with exit code 0

以上是关于python与正则的主要内容，如果未能解决你的问题，请参考以下文章