Python学习第四周总结

Posted 2021-09-05 月瘦如眉

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Python学习第四周总结相关的知识，希望对你有一定的参考价值。

文章目录

学习Python第四周总结
modified_content = re.sub(r'[傻沙煞][逼笔雕鄙]|马化腾|fuck|shit', '*', content, flags=re.I)

学习Python第四周总结

正则表达式

Python使用正则表达式的两种方式：

不创建正则表达式对象，直接调用函数进行匹配操作

match
fullmatch

~创建正则表达式对象（Pattern），通过给对象发消息实现匹配操作

compile

例子：网站注册，用户名要求必须是字母、数字、下划线，长度在6-20个字符之间，检查用户名是否合法，应该怎么做？

import re


username = input('请输入用户名: ')
username_pattern = re.compile(r'^\\w{6,20}$')
print(type(username_pattern))
matcher = username_pattern.match(username)
print(type(matcher))
if matcher is None:
    print('无效的用户名！！！')
else:
    print(matcher.group())
# matcher = re.match(r'\\w{6,20}$', username)
# if matcher is None:
#     print('用户名不合法！！！')
# else:
#     print(matcher)
#     print(matcher.group())


# qq = input('请输入QQ号: ')
# matcher = re.fullmatch(r'[1-9]\\d{4,10}', qq)
# if matcher is None:
#     print('QQ号错误！！！')
# else:
#     print(matcher)
#     print(matcher.group())

import re
content = """报警电话: 110, 我们班是Python-2105班,
我的QQ是123456, 我的手机号是15581572054,谢谢!"""
# matcher = re.search(r'1[3-9]\\d{9}', content)
# if not matcher:
#     print('没有找到手机号')
# else:
#     print(matcher.group())

pattern = re.compile(r'\\d+')
matcher = pattern.search(content)
while matcher:
    print(matcher.group())
    print(matcher.start(), matcher.end())
    matcher = pattern.search(content, matcher.end())

results = pattern.findall(content)
for result in results:
    print(result)

results = re.findall(r'\\d+', content)
for result in results:
    print(result)

这是我们对正则表达式中的一些基本符号进行的扼要总结。

符号	解释	示例	说明
`.`	匹配任意字符	`b.t`	可以匹配bat / but / b#t / b1t等
`\\w`	匹配字母/数字/下划线	`b\\wt`	可以匹配bat / b1t / b_t等但不能匹配b#t
`\\s`	匹配空白字符（包括\\r、\\n、\\t等）	`love\\syou`	可以匹配love you
`\\d`	匹配数字	`\\d\\d`	可以匹配01 / 23 / 99等
`\\b`	匹配单词的边界	`\\bThe\\b`
`^`	匹配字符串的开始	`^The`	可以匹配The开头的字符串
`$`	匹配字符串的结束	`.exe$`	可以匹配.exe结尾的字符串
`\\W`	匹配非字母/数字/下划线	`b\\Wt`	可以匹配b#t / b@t等但不能匹配but / b1t / b_t等
`\\S`	匹配非空白字符	`love\\Syou`	可以匹配love#you等但不能匹配love you
`\\D`	匹配非数字	`\\d\\D`	可以匹配9a / 3# / 0F等
`\\B`	匹配非单词边界	`\\Bio\\B`
`[]`	匹配来自字符集的任意单一字符	`[aeiou]`	可以匹配任一元音字母字符
`[^]`	匹配不在字符集中的任意单一字符	`[^aeiou]`	可以匹配任一非元音字母字符
`*`	匹配0次或多次	`\\w*`
`+`	匹配1次或多次	`\\w+`
`?`	匹配0次或1次	`\\w?`
`{N}`	匹配N次	`\\w{3}`
`{M,}`	匹配至少M次	`\\w{3,}`
`{M,N}`	匹配至少M次至多N次	`\\w{3,6}`
`\|`	分支	`foo\|bar`	可以匹配foo或者bar
`(?#)`	注释
`(exp)`	匹配exp并捕获到自动命名的组中
`(?<name>exp)`	匹配exp并捕获到名为name的组中
`(?:exp)`	匹配exp但是不捕获匹配的文本
`(?=exp)`	匹配exp前面的位置	`\\b\\w+(?=ing)`	可以匹配I’m dancing中的danc
`(?<=exp)`	匹配exp后面的位置	`(?<=\\bdanc)\\w+\\b`	可以匹配I love dancing and reading中的第一个ing
`(?!exp)`	匹配后面不是exp的位置
`(?<!exp)`	匹配前面不是exp的位置
`*?`	重复任意次，但尽可能少重复	`a.b` `a.?b`	将正则表达式应用于aabab，前者会匹配整个字符串aabab，后者会匹配aab和ab两个字符串
`+?`	重复1次或多次，但尽可能少重复
`??`	重复0次或1次，但尽可能少重复
`{M,N}?`	重复M到N次，但尽可能少重复
`{M,}?`	重复M次以上，但尽可能少重复

import re
import requests

# 匹配整个a标签，但是只捕获（）中的内容--->正则表达式的捕获组
pattern = re.compile(r'<a\\s.*?href="(.+?)".*?title="(.+?)".*?>')
resp = requests.get('https://www.sohu.com/')
results = pattern.findall(resp.text)
for href, title in results:
    print(title)
    if not href.startswith('https://www.sohu.com'):
        href = 'https://www.sohu.com' + href
    print(href)

正则表达式捕获组

从网页上获取新闻的标题和链接

import re
import requests

# 匹配整个a标签，但是只捕获（）中的内容--->正则表达式的捕获组
pattern = re.compile(r'<a\\s.*?href="(.+?)".*?title="(.+?)".*?>')
resp = requests.get('https://www.sohu.com/')
results = pattern.findall(resp.text)
for href, title in results:
    print(title)
    if not href.startswith('https://www.sohu.com'):
        href = 'https://www.sohu.com' + href
    print(href)

Python对正则表达式的支持

Python提供了re模块来支持正则表达式相关操作，下面是re模块中的核心函数。

函数	说明
`compile(pattern, flags=0)`	编译正则表达式返回正则表达式对象
`match(pattern, string, flags=0)`	用正则表达式匹配字符串成功返回匹配对象否则返回`None`
`search(pattern, string, flags=0)`	搜索字符串中第一次出现正则表达式的模式成功返回匹配对象否则返回`None`
`split(pattern, string, maxsplit=0, flags=0)`	用正则表达式指定的模式分隔符拆分字符串返回列表
`sub(pattern, repl, string, count=0, flags=0)`	用指定的字符串替换原字符串中与正则表达式匹配的模式可以用`count`指定替换的次数
`fullmatch(pattern, string, flags=0)`	`match`函数的完全匹配（从字符串开头到结尾）版本
`findall(pattern, string, flags=0)`	查找字符串所有与正则表达式匹配的模式返回字符串的列表
`finditer(pattern, string, flags=0)`	查找字符串所有与正则表达式匹配的模式返回一个迭代器
`purge()`	清除隐式编译的正则表达式的缓存
`re.I` / `re.IGNORECASE`	忽略大小写匹配标记
`re.M` / `re.MULTILINE`	多行匹配标记

不良内容过滤

import re

content = '马化腾是一个沙雕煞笔，FUck you！'
pattern = re.compile(r'[傻沙煞][逼笔雕鄙]|马化腾|fuck|shit', flags=re.IGNORECASE)
# modified_content = re.sub(r'[傻沙煞][逼笔雕鄙]|马化腾|fuck|shit', '*', content, flags=re.I)
modified_content = pattern.sub('*', content)
print(modified_content)

IGNORECASE)

modified_content = re.sub(r’[傻沙煞][逼笔雕鄙]|马化腾|fuck|shit’, ‘*’, content, flags=re.I)

modified_content = pattern.sub(’*’, content)
print(modified_content)

以上是关于Python学习第四周总结的主要内容，如果未能解决你的问题，请参考以下文章