在 Python3 中使用 BeautifulSoup4 刮价格 Udemy 网站

Posted 2023-02-15

技术标签:

【中文标题】在 Python3 中使用 BeautifulSoup4 刮价格 Udemy 网站【英文标题】：Scraping prices with BeautifulSoup4 in Python3 Udemy Website 【发布时间】：2021-12-18 03:44:25 【问题描述】：

我正在尝试从 Udemy 网站提取价格数据以及学生人数。我在 Windows 上，我在 conda 环境中使用 Python 3.8 和 BeautifoulSoup。

这是我的代码：

url = 'https://www.udemy.com/course/business-analysis-conduct-a-strategy-analysis/'
html = requests.get(url).content
bs = BeautifulSoup(html, 'lxml')
searchingprice = bs.find('div', 'class':'price-text--price-part--2npPm udlite-clp-discount-price udlite-heading-xxl','data-purpose':'course-price-text')
searchingstudents = bs.find('div', 'class':'','data-purpose':'enrollment')
print(searchingprice)
print(searchingstudents)

而且我只获得有关学生的信息，而不是价格。我做错了什么？

None
<div class="" data-purpose="enrollment">
13,490 students
</div>

这里是网站的截图：

谢谢！

【问题讨论】：

【参考方案1】：

价格不在源代码中，它是使用 javascript 获取的。我们必须采取同样的步骤。这段代码是你自己的，bs 已经加载了

# get id of the course
course_id=bs.body.attrs['data-clp-course-id']
# build proper request, feel free to delete unneeded data requests
link=f'https://www.udemy.com/api-2.0/pricing/?course_ids=course_id&fields[pricing_result]=price,discount_price,list_price,price_detail,price_serve_tracking_id'
# fetch the data
res=requests.get(link).json()
print(res)
>>> 'courses': '1596446': '_class': 'pricing_result', 'price_serve_tracking_id': 'rbNYz3yCSiS2G1J62gtSzg', 'price': 'amount': 16.99, 'currency': 'EUR', 'price_string': '€16.99', 'currency_symbol': '€', 'list_price': 'amount': 119.99, 'currency': 'EUR', 'price_string': '€119.99', 'currency_symbol': '€', 'discount_price': 'amount': 17.0, 'currency': 'EUR', 'price_string': '€17', 'currency_symbol': '€', 'price_detail': 'amount': 119.99, 'currency': 'EUR', 'price_string': '€119.99', 'currency_symbol': '€', 'bundles':

【讨论】：

【参考方案2】：

html = """<div class="price-text--container--103D9 udlite-clp-price-text" 
data-purpose="price-text-container"><div class="price-text--price-part--2npPm 
udlite-clp-discount-price udlite-heading-lg" 
data-purpose="course-price-text">
<span class="udlite-sr-only">Current price</span>
<span><span>$14.99</span></span></div>
<div class="price-text--price-part--2npPm price-text--original-price--1sDdx 
udlite-clp-list-price udlite-text-sm" data-purpose="original-price-container">
<div data-purpose="course-old-price-text"><span class="udlite-sr-only">Original Price</span>
<span><s><span>$99.99</span></s></span></div></div>
<div class="price-text--price-part--2npPm udlite-clp-percent-discount udlite-text-sm"
data-purpose="discount-percentage"><span class="udlite-sr-only">Discount</span><span>85% off</span>
</div></div>"""

soup = BeautifulSoup(html, 'lxml')
# find the children of the main div class
lst = soup.find('div', class_='price-text--container--103D9 udlite-clp-price-text').findChildren('span')
# list comprehension to find the span text that starts with $ and keep the first element
print([span.text for span in lst if span.text.startswith('$')][0])  # -> '$14.99'

【讨论】：

不工作。错误：AttributeError: 'NoneType' object has no attribute 'findChildren'

以上是关于在 Python3 中使用 BeautifulSoup4 刮价格 Udemy 网站的主要内容，如果未能解决你的问题，请参考以下文章