IOError:[Errno 套接字错误] 使用 BeautifulSoup

Posted

技术标签:

【中文标题】IOError:[Errno 套接字错误] 使用 BeautifulSoup【英文标题】:IOError: [Errno socket error] using BeautifulSoup 【发布时间】:2017-02-11 14:22:39 【问题描述】:

我正在尝试使用 Python 2.7 的美丽汤从美国人口普查网站获取数据。这是我使用的代码:

import urllib
from bs4 import BeautifulSoup

url = "https://www.census.gov/quickfacts/table/PST045215/01"
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)

但是,这是我得到的错误:

IOError                                   Traceback (most recent call last)
<ipython-input-5-47941f5ea96a> in <module>()
     59 
     60 url = "https://www.census.gov/quickfacts/table/PST045215/01"
---> 61 html = urllib.urlopen(url).read()
     62 soup = BeautifulSoup(html)
     63 

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.pyc in urlopen(url, data, proxies, context)
     85         opener = _urlopener
     86     if data is None:
---> 87         return opener.open(url)
     88     else:
     89         return opener.open(url, data)

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.pyc in open(self, fullurl, data)
    211         try:
    212             if data is None:
--> 213                 return getattr(self, name)(url)
    214             else:
    215                 return getattr(self, name)(url, data)

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.pyc in open_https(self, url, data)
    441             if realhost: h.putheader('Host', realhost)
    442             for args in self.addheaders: h.putheader(*args)
--> 443             h.endheaders(data)
    444             errcode, errmsg, headers = h.getreply()
    445             fp = h.getfile()

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.pyc in endheaders(self, message_body)
   1051         else:
   1052             raise CannotSendHeader()
-> 1053         self._send_output(message_body)
   1054 
   1055     def request(self, method, url, body=None, headers=):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.pyc in _send_output(self, message_body)
    895             msg += message_body
    896             message_body = None
--> 897         self.send(msg)
    898         if message_body is not None:
    899             #message_body was not a string (i.e. it is a file) and

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.pyc in send(self, data)
    857         if self.sock is None:
    858             if self.auto_open:
--> 859                 self.connect()
    860             else:
    861                 raise NotConnected()

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.pyc in connect(self)
   1276 
   1277             self.sock = self._context.wrap_socket(self.sock,
-> 1278                                                   server_hostname=server_hostname)
   1279 
   1280     __all__.append("HTTPSConnection")

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.pyc in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname)
    351                          suppress_ragged_eofs=suppress_ragged_eofs,
    352                          server_hostname=server_hostname,
--> 353                          _context=self)
    354 
    355     def set_npn_protocols(self, npn_protocols):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.pyc in __init__(self, sock, keyfile, certfile, server_side, cert_reqs, ssl_version, ca_certs, do_handshake_on_connect, family, type, proto, fileno, suppress_ragged_eofs, npn_protocols, ciphers, server_hostname, _context)
    599                         # non-blocking
    600                         raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
--> 601                     self.do_handshake()
    602 
    603             except (OSError, ValueError):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/ssl.pyc in do_handshake(self, block)
    828             if timeout == 0.0 and block:
    829                 self.settimeout(None)
--> 830             self._sslobj.do_handshake()
    831         finally:
    832             self.settimeout(timeout)

IOError: [Errno socket error] [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:590)

我从 this 和 this 等两个 Stack Overflow 来源中寻找解决方案,但它们并没有解决问题。

【问题讨论】:

【参考方案1】:

解决此问题的一种方法是切换到requests

import requests
from bs4 import BeautifulSoup

url = "https://www.census.gov/quickfacts/table/PST045215/01"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
print(soup.title.get_text())

打印:

Alabama QuickFacts from the US Census Bureau

请注意,这可能还需要安装 requests\[security\] package:

pip install requests[security]

【讨论】:

嗨,亚历克斯。是的。实际上,您的方法将在 pip install requests[security] 之后起作用。非常感谢。

以上是关于IOError:[Errno 套接字错误] 使用 BeautifulSoup的主要内容,如果未能解决你的问题,请参考以下文章

如何修复以下 Django 错误:“类型:IOError”“值:[Errno 13] 权限被拒绝”

如何解决打开文件时出现IOError[errno 17]文件?

IOError: [Errno 22] 使用 pandas.read_excel 的无效模式 ('rb')

如何获取 IOError 的 errno?

Python IOError 中的错误:[Errno 2] 没有这样的文件或目录:'data.csv' [重复]

shutil.move(scr, dst) 得到我 IOError: [Errno 13] Permission denied 和另外 3 个错误