python 正则或者BeautifulSoup 把带http的字符串过滤掉，怎么实现，谢谢

Posted 2023-03-29

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python 正则或者BeautifulSoup 把带http的字符串过滤掉，怎么实现，谢谢相关的知识，希望对你有一定的参考价值。

/a.asp
/1.asp
http://www.baidu.com/
http://www.sina.com.cn/
http://a.com/
http://www.51.la/

参考技术A 可以利用字符串的操作函数，使用条件判断该字符串中是否存在'http'

使用 python 和正则表达式 BeautifulSoup lxml 查找文本

【中文标题】使用 python 和正则表达式 BeautifulSoup lxml 查找文本【英文标题】：Find with text using python and regex BeautifulSoup lxml 【发布时间】：2021-12-24 07:10:48 【问题描述】：

如何查找带有文本的数据？例如

<td>
    <span class="data-list clear-none">active</span>
    <span class="data-list clear-none">not active</span>
    <span class="data-list clear-none">null</span>
    <span class="data-list clear-none">none</span>
<td>

我只想获取包含 active 的类？什么是最好的方法？现在试试

current_lifter = soup.findAll("span", "class": "data-list clear-none")

但只想获取开始 active

的文本

【问题讨论】：

【参考方案1】：

使用string='active' 作为find（或findAll）的参数

>>> soup.find("span", "class": "data-list clear-none", string='active')
<span class="data-list clear-none">active</span>

是否有任何功能，例如 active *.我的意思是在活动后得到一切

import re

soup.find("span", "class": "data-list clear-none", string=re.compile('^active.*'))

请参阅documentation。 string 也可以接受正则表达式。

【讨论】：

太好了，是否有任何功能，例如 active * 。我的意思是在活动后得到一切【参考方案2】：

current_lifter = soup.find_all("span", class = "data-list clear-none", string='active')

然后运行迭代器获取文本。

【讨论】：

太好了，是否有任何功能，例如 active * 。我的意思是在活动后得到一切

以上是关于python 正则或者BeautifulSoup 把带http的字符串过滤掉，怎么实现，谢谢的主要内容，如果未能解决你的问题，请参考以下文章