Python标准库--re模块

Posted 2020-09-21

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Python标准库--re模块相关的知识，希望对你有一定的参考价值。

re:正则表达式

__all__ = [
    "match", "fullmatch", "search", "sub", "subn", "split",
    "findall", "finditer", "compile", "purge", "template", "escape",
    "error", "A", "I", "L", "M", "S", "X", "U",
    "ASCII", "IGNORECASE", "LOCALE", "MULTILINE", "DOTALL", "VERBOSE",
    "UNICODE",
]

一些常量

I = IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case
# 使匹配对大小写不敏感
L = LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale
# 影响 "w, "W, "b, 和 "B，这取决于当前的本地化设置。 locales 是 C 语言库中的一项功能，是用来为需要考虑不同语言的编程提供帮助的。
# 举个例子，如果你正在处理法文文本，你想用 "w+ 来匹配文字，但 "w 只匹配字符类 [A-Za-z]；它并不能匹配 "é" 或 "?"。 如果你的系统配置适当且本地化设置为法语，那么内部的 C 函数将告诉程序 "é" 也应该被认为是一个字母。
# 当在编译正则表达式时使用 LOCALE 标志会得到用这些 C 函数来处理 "w 後的编译对象；这会更慢，但也会象你希望的那样可以用 "w+ 来匹配法文文本。
M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
# 使用 "^" 只匹配字符串的开始，而 $ 则只匹配字符串的结尾和直接在换行前（如果有的话）的字符串结尾。
# 当本标志指定後，"^" 匹配字符串的开始和字符串中每行的开始。同样的， $ 元字符匹配字符串结尾和字符串中每行的结尾（直接在每个换行之前）。
S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline
# 使 "." 特殊字符完全匹配任何字符，包括换行；没有这个标志， "." 匹配除了换行外的任何字符。
X = VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments
# 当该标志被指定时，在 RE 字符串中的空白符被忽略，除非该空白符在字符类中或在反斜杠之後；这可以让你更清晰地组织和缩进 RE。它也可以允许你将注释写入 RE，这些注释会被引擎忽略；注释用 "#"号 来标识，不过该符号不能在字符串或反斜杠之後。

函数

match() 从头匹配, 没有返回空

search() 字符串中查找,返回第一个

pattern = ‘this‘
text = ‘Does this text match this pattern?‘

match = re.match(pattern, text)
search = re.search(pattern, text)

s = search.start()
e = search.end()

print(match)
print(search.re.pattern)
print(search.string)
print(s)
print(e)
print(text[s:e])

"""
None
this
Does this text match this pattern?
5
9
this
"""

complie()

regex = re.compile(pattern)

print(regex.match(text))
print(regex.search(text))

"""
None
<_sre.SRE_Match object; span=(5, 9), match=‘this‘>
"""

findall() 与finditer()

迭代器生成Match实例, 通过group() start() end() 获取信息

text = ‘abbaaabbbbaaaabbbbbaaa‘
pattern = ‘ab‘

print(re.findall(pattern, text))

ab = re.finditer(pattern, text)

for match in ab:
    print(match)

for match in ab:
    print(str(match.start()) + ‘->‘ + str(match.end()), end=‘=‘)
    print(match.group())


"""
[‘ab‘, ‘ab‘, ‘ab‘]
<_sre.SRE_Match object; span=(0, 2), match=‘ab‘>
<_sre.SRE_Match object; span=(5, 7), match=‘ab‘>
<_sre.SRE_Match object; span=(13, 15), match=‘ab‘>

0->2=ab
5->7=ab
13->15=ab
"""

groups() 所有匹配字符串

group() 整体匹配字符串

group(0) group(1) 按组匹配的字符串

sub() 与 subn()

subn() 返回元祖,包含替换次数

bold = re.compile(r‘\*{2}(.*?)\*{2}‘)

text = "Make this **bold**.  This **too**."

print(text)

print(bold.sub(r‘<b>\1</b>‘, text, count=1))

print(bold.subn(r‘<b>\1</b>‘, text))

"""
Make this **bold**.  This **too**.
Make this <b>bold</b>.  This **too**.
(‘Make this <b>bold</b>.  This **too**.‘, 1)
"""

以上是关于Python标准库--re模块的主要内容，如果未能解决你的问题，请参考以下文章

Python标准库笔记 — re模块

Python 基础 - Day 5 Learning Note - 模块之标准库：RE (14) 正则表达式

Python源码是啥意思？

转：Python标准库(非常经典的各种模块介绍)

常用的python标准库

Python标准库 - re