使用BeautifulSoup获取产品ID,品牌名称和图像时,在我的代码中出现问题
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了使用BeautifulSoup获取产品ID,品牌名称和图像时,在我的代码中出现问题相关的知识,希望对你有一定的参考价值。
我试图从以下代码中获取sample product url的产品详细信息 -
def get_soup(url):
soup = None
try:
response = requests.get(url)
if response.status_code == 200:
html = response.content
soup = BeautifulSoup(html, "html.parser")
except Exception as exc:
print("Unable to fecth data due to..", str(exc))
finally:
return soup
def get_product_details(url):
soup = get_soup(url)
sleep(1)
try:
product_shop = soup.find('div', attrs={"class": "buy"})
if product_shop is not None:
available_product_shop = soup.findAll('div')[2].find('span').text == "In Stock"
if available_product_shop is not None:
prod_details = dict()
merchant_product_id = soup.find('div', attrs={'class': 'description'}).findAll('span')[3].text
if merchant_product_id is not None:
prod_details['merchant_product_id'] = merchant_product_id
check_brand = soup.find('div', attrs={'class': 'description'}).findAll('span')[2].find('a')
if check_brand is not None:
prod_details['brand'] = check_brand.text
prod_details['merchant_image_urls'] = ",".join(list(filter(None, map(lambda x: x['href'].replace(",", "%2C"),
soup.find('div', attrs={
'class': 'left'}).findAll(
'a')))))
check_price = soup.find('span', attrs={"class": "price-old"})
if check_price is not None:
prod_details['price'] = check_price.text.replace("SGD $", "")
check_sale_price = soup.find('span', attrs={"class": "price-new"})
if check_sale_price is not None:
prod_details['sale_price'] = check_sale_price.text.replace("SGD $", "")
return prod_details
except Exception as exc:
print("Error..", str(exc))
上面代码中的问题是我无法获得品牌的价值,产品ID和图片网址也未被正确获取。
任何人都可以看看我的代码,并帮助我获得正确的细节?
答案
好的,我回答你问题的方法是重构,简化和修复代码。针对特定元素有很多改进。它更清洁,更容易理解。请随时向我询问您不理解的细节。祝你的项目好运(:
码:
import re
import requests
from bs4 import BeautifulSoup
def get_product_details(url):
html = requests.get(url).text
soup = BeautifulSoup(html, 'lxml')
if soup.select_one('.stock').text != 'In Stock':
return
product_code_caption = soup.find('span', string=re.compile('Product Code:'))
product_code = product_code_caption.next_sibling.strip()
brand_container = soup.find('span', string=re.compile('Brand:'))
brand = brand_container.find_next_sibling('a').string
urls = [a['href'] for a in soup.select('.cloud-zoom-gallery')]
old_price = soup.select_one('.price-old').text.replace('SGD $', '')
new_price = soup.select_one('.price-new').text.replace('SGD $', '')
prod_details = {
'merchant_product_id': product_code,
'brand': brand,
'merchant_image_urls': urls,
'price': old_price,
'sale_price': new_price
}
return prod_details
import pprint
pprint.pprint(get_product_details('http://www.infantree.net/shop/index.php?route=product/product&path=59_113&product_id=1070'))
输出:
{'brand': 'Britax',
'merchant_image_urls': ['http://www.infantree.net/shop/image/cache/data/Britax '
'Products/Britax-Light-Travel-System_BlackThunder-683x1024-500x500.jpg',
'http://www.infantree.net/shop/image/cache/data/Britax '
'Products/Formula-One-Flame-Red1024x1024-510x510-500x500.jpg',
'http://www.infantree.net/shop/image/cache/data/Britax '
'Products/Formula-One-Cosmos-Black1024x1024-768x768-500x500.jpg',
'http://www.infantree.net/shop/image/cache/data/Britax '
'Products/Black-Thunder-Ocean-Blue1024x1024-768x768-500x500.jpg',
'http://www.infantree.net/shop/image/cache/data/Britax '
'Products/Black-Thunder-Flame-Red1024x1024-768x768-500x500.jpg',
'http://www.infantree.net/shop/image/cache/data/Britax '
'Products/Black-Thunder-Cosmos-Black1024x1024-768x768-500x500.jpg',
'http://www.infantree.net/shop/image/cache/data/Britax '
'Products/Formaula-One-Ocean-Blue1024x1024-510x510-500x500.jpg',
'http://www.infantree.net/shop/image/cache/data/Britax '
'Products/Olympian-Blue-Cosmos-Black1024x1024-510x510-500x500.jpg',
'http://www.infantree.net/shop/image/cache/data/Britax '
'Products/Olympian-Blue-Flame-Red1024x1024-768x768-500x500.jpg',
'http://www.infantree.net/shop/image/cache/data/Britax '
'Products/Olympian-Blue-Ocean-Blue1024x1024-100x100-500x500.jpg'],
'merchant_product_id': 'BRITAX Light + i-Size Travel System',
'price': '1,032.00',
'sale_price': '699.00'}
以上是关于使用BeautifulSoup获取产品ID,品牌名称和图像时,在我的代码中出现问题的主要内容,如果未能解决你的问题,请参考以下文章
Python爬虫库BeautifulSoup获取对象(标签)名,属性,内容,注释