美汤python中的find()和find_all()有啥区别?

Posted

技术标签:

【中文标题】美汤python中的find()和find_all()有啥区别?【英文标题】:What is the difference between find() and find_all() in beautiful soup python?美汤python中的find()和find_all()有什么区别? 【发布时间】:2020-05-03 23:04:39 【问题描述】:

我在进行网络抓取,但我在 find() 和 find_all() 中卡住/感到困惑。

比如在哪里使用 find_all,在哪里使用 find()。

另外,我可以在哪里使用这种方法,如 for 循环ul li 列表??

这是我尝试过的代码


from bs4 import BeautifulSoup
import requests

urls = "https://www.flipkart.com/offers-list/latest-launches?screen=dynamic&pk=themeViews%3DAug19-Latest-launch-Phones%3ADTDealcard~widgetType%3DdealCard~contentType%3Dneo&wid=7.dealCard.OMU_5&otracker=hp_omu_Latest%2BLaunches_5&otracker1=hp_omu_WHITELISTED_neo%2Fmerchandising_Latest%2BLaunches_NA_wc_view-all_5"

source = requests.get(urls)

soup = BeautifulSoup(source.content, 'html.parser')

divs = soup.find_all('div', class_='MDGhAp')

names = divs.find_all('a')

full_name = names.find_all('div', class_='iUmrbN').text

print(full_name)

得到这样的错误

  File "C:/Users/ASUS/Desktop/utube/sunil.py", line 9, in <module>
    names = divs.find_all('a')
  File "C:\Users\ASUS\AppData\Local\Programs\Python\Python38-32\lib\site-packages\bs4\element.py", line 1601, in __getattr__
    raise AttributeError(

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

那么谁能解释我应该在哪里使用 findfind all ??

【问题讨论】:

【参考方案1】:

这个例子可能更清楚:

from bs4 import BeautifulSoup
import re

html = """
<ul>
<li>First</li>
<li>Second</li>
<li>Third</li>
</ul>
"""   
soup = BeautifulSoup(html,'html.parser')

for n in soup.find('li'):
  # It Give you one element     
  print(n)

for n in soup.find_all('li'):    
  # It Give you all elements
  print(n)

结果:

First

<li>First</li>
<li>Second</li>
<li>Third</li>

更多信息请阅读https://www.crummy.com/software/BeautifulSoup/bs4/doc/#calling-a-tag-is-like-calling-find-all

【讨论】:

【参考方案2】:

find()- 只在页面中找到被搜索的元素时返回结果。返回类型为&lt;class 'bs4.element.Tag'&gt;

find_all()- 返回所有匹配项(即扫描整个文档并返回所有结果,返回类型将为&lt;class 'bs4.element.ResultSet'&gt;

from robobrowser import RoboBrowser
browser = RoboBrowser(history=True)
browser = RoboBrowser(parser='html.parser')
browser.open('http://www.***.com')
res=browser.find('h3')
print(type(res),res)
print(" ")
res=browser.find_all('h3')
print(type(res),res)
print(" ")
print("Iterating the Resultset")
print(" ")
for x in range(0,len(res)):
  print(x,res[x])
  print(" ")

输出:

<class 'bs4.element.Tag'> <h3><a href="https://***.com">current community</a>
</h3>

<class 'bs4.element.ResultSet'> [<h3><a href="https://***.com">current community</a>
</h3>, <h3>
your communities            </h3>, <h3><a href="https://stackexchange.com/sites">more stack exchange communities</a>
</h3>, <h3 class="w90 mx-auto ta-center p-ff-roboto-slab-bold fs-headline2 mb24">Questions are everywhere, answers are on Stack Overflow</h3>, <h3 class="w90 mx-auto ta-center p-ff-roboto-slab-bold fs-headline2 mb24">Learn and grow with Stack Overflow</h3>, <h3 class="mx-auto w90 wmx12 p-ff-roboto-slab-bold fs-headline2 mb24 lg:ta-center">Looking for a job?</h3>]

Iterating the Resultset

0 <h3><a href="https://***.com">current community</a>
</h3>

1 <h3>
your communities            </h3>

2 <h3><a href="https://stackexchange.com/sites">more stack exchange communities</a>
</h3>

3 <h3 class="w90 mx-auto ta-center p-ff-roboto-slab-bold fs-headline2 mb24">Questions are everywhere, answers are on Stack Overflow</h3>

4 <h3 class="w90 mx-auto ta-center p-ff-roboto-slab-bold fs-headline2 mb24">Learn and grow with Stack Overflow</h3>

5 <h3 class="mx-auto w90 wmx12 p-ff-roboto-slab-bold fs-headline2 mb24 lg:ta-center">Looking for a job?</h3>

【讨论】:

【参考方案3】:

从 Beautiful Soup 文档中找到这个。如果您要抓取更具体的内容,请尝试 find,如果您要从 aspan 抓取更一般的内容,请尝试 find_all。 https://www.crummy.com/software/BeautifulSoup/bs4/doc/

soup.find_all('a')
# [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
#  <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
#  <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

soup.find(id="link3")
# <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>

希望这会有所帮助!

【讨论】:

【参考方案4】:

让我们借助一个例子来理解:我正在尝试获取上述网站上的书名列表。 (https://www.bookdepository.com/bestsellers)

为了一次遍历所有与书籍相关的标签,我使用 find_all 命令,随后我在每个列表项中使用 find 来获取书名。

注意find 将获取您的第一个匹配项(仅在这种情况下匹配),而 find_all 将生成所有匹配项的列表,您可以进一步使用它来迭代。)

from bs4 import BeautifulSoup as bs

import requests

url = "https://www.bookdepository.com/bestsellers"

response = requests.get(url)

使用 find_all 浏览所有书籍:

a=soup.find_all("div",class_ = "item-info")

使用 find 在每个书项中遍历每本书的标题

for i in a:

print(i.find("h3",class_ = "title").get_text())

【讨论】:

【参考方案5】:

From Documentation

find_all() 方法扫描整个文档以查找结果,但有时您只想找到一个结果。如果您知道一个文档只有一个标签,那么扫描整个文档以查找更多标签是浪费时间。而不是每次调用 find_all 时都传入 limit=1,您可以使用 find() 方法……接下来的两个句子都是等价的:

soup.find_all('title', limit=1)

soup.find('title')

【讨论】:

以上是关于美汤python中的find()和find_all()有啥区别?的主要内容,如果未能解决你的问题,请参考以下文章

[Python]find_all函数 2020.2.7

python 学习之FAQ:find 与 find_all 使用

python爬虫,用find_all()找到某一标签后,怎么获取下面数个同名子标签的内容

BeautifulSoup 中“findAll”和“find_all”的区别

Python爬虫编程思想(56):Beautiful Soup方法选择器之find方法

Python Bs4 回顾