如何用 Beautifulsoup 解析“数据文本”？ [复制]

Posted 2023-02-23

技术标签:

【中文标题】如何用 Beautifulsoup 解析“数据文本”？ [复制]【英文标题】：How to parse "data-text" with beautifulsoup? [duplicate] 【发布时间】：2020-11-06 11:55:27 【问题描述】：

有一段html代码：

<div class="some div name" data-text="important text">...</div>

我们需要从“data-text”中获取文本。我试图在 BeautifulSoup 官方文档中找到一些东西，但没有这样的东西（或者我看起来很糟糕）。

【问题讨论】：

1.将您的 html 对象传递给美丽的汤。 Ldata = BeautifulSoup("..."); 2. 从 div 中获取您要查找的属性。 ldataText = Ldata.attrs['data-text']; 【参考方案1】：

您只需在此代码中将 'href' 替换为 'data-text'：

html = urlopen("http://kite.com")
text = html.read()
plaintext = text.decode('utf8')
links = re.findall("href=[\"\'](.*?)[\"\']", plaintext)
print(links[:5])

https://kite.com/python/answers/how-to-get-href-links-from-urllib-urlopen-in-python

【讨论】：

【参考方案2】：

您可以在标签上使用['data-text'] 或.get('data-text') 来获取属性值。

例如：

from bs4 import BeautifulSoup

txt = '''<div class="some div name" data-text="important text">...</div>'''
soup = BeautifulSoup(txt, 'html.parser')

print(soup.find('div', 'data-text': True)['data-text'])

打印：

important text

【讨论】：

谢谢！这正是我想要的方式！【参考方案3】：

你可以试试这个。

from bs4 import BeautifulSoup
import requests
...
bsObj = BeautifulSoup(html, features = "html.parser")
div_tag = bsObj.find("div", class_ = "some div name")
if div_tag:
    data_text = div_tag['data-text']
print(data_text)

希望对你有帮助。

【讨论】：

【参考方案4】：

你提供的信息这么少，我想不出比这更好的了：

from bs4 import BeautifulSoup

html = '<div class="some div name" data-text="important text">...</div>'
soup = BeautifulSoup(html, 'html.parser')
div = soup.select_one('div.some.div.name')
print(div.get('data-text'))

输出：

important text

【讨论】：

以上是关于如何用 Beautifulsoup 解析“数据文本”？ [复制]的主要内容，如果未能解决你的问题，请参考以下文章