IOError:[Errno 套接字错误] 使用 BeautifulSoup
Posted
技术标签:
【中文标题】IOError:[Errno 套接字错误] 使用 BeautifulSoup【英文标题】:IOError: [Errno socket error] using BeautifulSoup 【发布时间】:2017-02-11 14:22:39 【问题描述】:我正在尝试使用 Python 2.7 的美丽汤从美国人口普查网站获取数据。这是我使用的代码:
import urllib
from bs4 import BeautifulSoup
url = "https://www.census.gov/quickfacts/table/PST045215/01"
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
但是,这是我得到的错误:
IOError Traceback (most recent call last)
<ipython-input-5-47941f5ea96a> in <module>()
59
60 url = "https://www.census.gov/quickfacts/table/PST045215/01"
---> 61 html = urllib.urlopen(url).read()
62 soup = BeautifulSoup(html)
63
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.pyc in urlopen(url, data, proxies, context)
85 opener = _urlopener
86 if data is None:
---> 87 return opener.open(url)
88 else:
89 return opener.open(url, data)
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.pyc in open(self, fullurl, data)
211 try:
212 if data is None:
--> 213 return getattr(self, name)(url)
214 else:
215 return getattr(self, name)(url, data)
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.pyc in open_https(self, url, data)
441 if realhost: h.putheader('Host', realhost)
442 for args in self.addheaders: h.putheader(*args)
--> 443 h.endheaders(data)
444 errcode, errmsg, headers = h.getreply()
445 fp = h.getfile()
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.pyc in endheaders(self, message_body)
1051 else:
1052 raise CannotSendHeader()
-> 1053 self._send_output(message_body)
1054
1055 def request(self, method, url, body=None, headers=):
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.pyc in _send_output(self, message_body)
895 msg += message_body
896 message_body = None
--> 897 self.send(msg)
898 if message_body is not None:
899 #message_body was not a string (i.e. it is a file) and
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.pyc in send(self, data)
857 if self.sock is None:
858 if self.auto_open:
--> 859 self.connect()
860 else:
861 raise NotConnected()
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.pyc in connect(self)
1276
1277 self.sock = self._context.wrap_socket(self.sock,
-> 1278 server_hostname=server_hostname)
1279
1280 __all__.append("HTTPSConnection")
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.pyc in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname)
351 suppress_ragged_eofs=suppress_ragged_eofs,
352 server_hostname=server_hostname,
--> 353 _context=self)
354
355 def set_npn_protocols(self, npn_protocols):
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.pyc in __init__(self, sock, keyfile, certfile, server_side, cert_reqs, ssl_version, ca_certs, do_handshake_on_connect, family, type, proto, fileno, suppress_ragged_eofs, npn_protocols, ciphers, server_hostname, _context)
599 # non-blocking
600 raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
--> 601 self.do_handshake()
602
603 except (OSError, ValueError):
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.pyc in do_handshake(self, block)
828 if timeout == 0.0 and block:
829 self.settimeout(None)
--> 830 self._sslobj.do_handshake()
831 finally:
832 self.settimeout(timeout)
IOError: [Errno socket error] [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:590)
我从 this 和 this 等两个 Stack Overflow 来源中寻找解决方案,但它们并没有解决问题。
【问题讨论】:
【参考方案1】:解决此问题的一种方法是切换到requests
:
import requests
from bs4 import BeautifulSoup
url = "https://www.census.gov/quickfacts/table/PST045215/01"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
print(soup.title.get_text())
打印:
Alabama QuickFacts from the US Census Bureau
请注意,这可能还需要安装 requests\[security\]
package:
pip install requests[security]
【讨论】:
嗨,亚历克斯。是的。实际上,您的方法将在 pip install requests[security] 之后起作用。非常感谢。以上是关于IOError:[Errno 套接字错误] 使用 BeautifulSoup的主要内容,如果未能解决你的问题,请参考以下文章
如何修复以下 Django 错误:“类型:IOError”“值:[Errno 13] 权限被拒绝”
如何解决打开文件时出现IOError[errno 17]文件?
IOError: [Errno 22] 使用 pandas.read_excel 的无效模式 ('rb')
Python IOError 中的错误:[Errno 2] 没有这样的文件或目录:'data.csv' [重复]
shutil.move(scr, dst) 得到我 IOError: [Errno 13] Permission denied 和另外 3 个错误