BeautifulSoup 从 find_all 的结果中找到 url

Posted 2023-02-18

技术标签:

【中文标题】BeautifulSoup 从 find_all 的结果中找到 url【英文标题】：BeautifulSoup find the url out of the result of the find_all 【发布时间】：2021-01-06 00:59:06 【问题描述】：

url = 'http://www.xxx'
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')

s1 = soup.find_all(id="contents")
print(s1, "\n")

find_all 的输出：

[<div id="contents" style="width:1000px;padding:10px 0;overflow:hidden;"><table style="margin:0;width:1000px;overflow:hidden;" >
<tr><td style="text-align:center;">
<img src="http://xxx/shop/data/editor/2020090302-01.jpg"/></td></tr></table>
</div>]

如何从结果中获取img 标签的src？我有什么方法可以获取 url 而不是 id="contents" 选项吗？我只想要结果中的 URL。

【问题讨论】：

你能添加你要报废的确切网址吗？ cobaro.co.kr/shop/goods/… 我们开始！从上面的网址。我想要的是获取图片的网址！这是 [cobaro.co.kr/shop/data/editor/2020090302-01.jpg"/></… 请记住，要中断文本行，您可以在行尾使用两个空格。不建议无缘无故地打开一个新段落（文本行之间有一个换行符）——这会占用页面太多空间。 【参考方案1】：

你可以像这样在div中获取img的src：

from bs4 import BeautifulSoup as bs
import urllib

url = 'http://www.cobaro.co.kr/shop/goods/goods_view.php?goodsno=8719&category=003004'
html = urllib.request.urlopen(url).read()
soup = bs(html, 'html.parser')
divs = soup.find_all(id="contents")

for div in divs:
    img_tag = div.find('img')
    print(img_tag['src'])

Output:

http://cobaro.co.kr/shop/data/editor/2020090302-01.jpg

【讨论】：

当然！我会投票！顺便说一句，这是我的第一个问题，所以......我会在 2 周内尝试，对吗？因为我发现我无法投票，但必须等待 2 周？ @SOOKIM 如果我的回答对您有所帮助，请考虑点赞并接受，这样其他用户也会发现它也很有用。

以上是关于BeautifulSoup 从 find_all 的结果中找到 url的主要内容，如果未能解决你的问题，请参考以下文章

BeautifulSoup find_all 仅限于 50 个结果？

find_all的用法 Python（bs4，BeautifulSoup）

BeautifulSoup 中“findAll”和“find_all”的区别

BeautifulSoup.find_all() 方法不适用于命名空间标签

BeautifulSoup库之find_all函数

BeautifulSoup 不会使用 .find_all('a') 抓取页面中的所有锚标记。我忽略了啥吗？