Python - 使用 BeautifulSoup 和 Urllib 进行抓取
Posted
技术标签:
【中文标题】Python - 使用 BeautifulSoup 和 Urllib 进行抓取【英文标题】:Python - Scraping with BeautifulSoup and Urllib 【发布时间】:2018-09-22 14:49:16 【问题描述】:我正在尝试阅读网站,但不幸的是出了点问题。
import bs4 as bs
import urllib.request
sauce = urllib.request.urlopen('https://csgoempire.com/withdraw').read()
soup = bs.BeautifulSoup(sauce,'lxml')
print(soup.find_all('p'))
错误:
Traceback (most recent call last):
File "F:/Informatika/Python3X/GamblinSitesBot/GamblingSitesBot.py", line 4, in <module>
sauce = urllib.request.urlopen('https://csgoempire.com/').read()
File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "c:\users\edgaras\appdata\local\programs\python\python36\Lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Process finished with exit code 1
此外,此代码适用于其他网站,例如 google.com
【问题讨论】:
我不认为这是整个堆栈跟踪?如果是,请提供完整的错误 似乎 URL 需要身份验证。引发 403 错误。 你有代理吗 HTTP error 403 in Python 3 Web Scraping的可能重复 【参考方案1】:您可以使用请求库来实现相同的目的。这很好用
import bs4 as bs
import requests
sauce = requests.get('https://csgoempire.com/withdraw')
soup = bs.BeautifulSoup(sauce.content,'html.parser')
print(soup.find_all('p'))
【讨论】:
以上是关于Python - 使用 BeautifulSoup 和 Urllib 进行抓取的主要内容,如果未能解决你的问题,请参考以下文章