Scrapy无法连接到仅支持旧版TLSv1的HTTPS站点。连接丢失
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Scrapy无法连接到仅支持旧版TLSv1的HTTPS站点。连接丢失相关的知识,希望对你有一定的参考价值。
使用scrapy 1.6.0(twisted 18.9.0,pyopenssl 19.0.0,openssl 1.0.2r,osx 10.14.3)。我排除了用户代理和robots.txt。似乎是证书协商问题。没有涉及Web代理。
重现:
04:49:59 dork@Dorks-MacBook:~
0 $ scrapy shell
.
.
.
>>> fetch('https://www.labor.ny.gov')
2019-04-05 16:45:11 [scrapy.core.engine] INFO: Spider opened
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Users/dork/project/venv/lib/python3.6/site-packages/scrapy/shell.py", line 115, in fetch
reactor, self._schedule, request, spider)
File "/Users/dork/project/venv/lib/python3.6/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread
result.raiseException()
File "/Users/dork/project/venv/lib/python3.6/site-packages/twisted/python/failure.py", line 467, in raiseException
raise self.value.with_traceback(self.tb)
twisted.web._newclient.ResponseNeverReceived: [<twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost.>]
尝试直接在命令行上通过OpenSSL进行连接和协商似乎也失败了:
0 $ openssl version
OpenSSL 1.0.2r 26 Feb 2019
04:49:59 dork@Dorks-MacBook:~
0 $ openssl s_client -connect www.labor.ny.gov:443
CONNECTED(00000003)
4472571500:error:140790E5:SSL routines:ssl23_write:ssl handshake failure:s23_lib.c:177:
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 0 bytes and written 307 bytes
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
Protocol : TLSv1.2
Cipher : 0000
Session-ID:
Session-ID-ctx:
Master-Key:
Key-Arg : None
PSK identity: None
PSK identity hint: None
SRP username: None
Start Time: 1554497411
Timeout : 300 (sec)
Verify return code: 0 (ok)
---
但是,如果我强制openssl到TLSv1它似乎工作。我只是不知道如何强制从scrapy - > twisted - > pyopenssl - > OpenSSL或者如果可能的话。
04:49:59 dork@Dorks-MacBook:~
0 $ openssl s_client -tls1 -connect www.labor.ny.gov:443
CONNECTED(00000003)
depth=2 C = BE, O = GlobalSign nv-sa, OU = Root CA, CN = GlobalSign Root CA
verify return:1
depth=1 C = BE, O = GlobalSign nv-sa, CN = GlobalSign Organization Validation CA - SHA256 - G2
verify return:1
depth=0 C = US, ST = New York, L = Albany, O = New York State Office for Technology, CN = labor.ny.gov
verify return:1
---
Certificate chain
0 s:/C=US/ST=New York/L=Albany/O=New York State Office for Technology/CN=labor.ny.gov
i:/C=BE/O=GlobalSign nv-sa/CN=GlobalSign Organization Validation CA - SHA256 - G2
1 s:/C=BE/O=GlobalSign nv-sa/CN=GlobalSign Organization Validation CA - SHA256 - G2
i:/C=BE/O=GlobalSign nv-sa/OU=Root CA/CN=GlobalSign Root CA
---
Server certificate
-----BEGIN CERTIFICATE-----
.
.
.
邮差也无法获取页面。看起来任何依赖于OpenSSL的东西都会默默地死掉。
不完整的答案; CW以防任何人可以添加scrapy(或相关)部分。
Man that server is bad!它只支持SSL2 SSL3和TLS1.0,其中前两个完全被破坏,第一个完全被打破了上个世纪。它标识为IIS / 6.0,它与Windows Server 2003相关 - 很久以前就已经过了。
FWLIW它实际上不是版本不容忍的,或者因为一些有缺陷的实现被发现在几年前而被打破了超过256字节的问题。如果我使用OpenSSL 1.0.2向ClientHello发送TLS1.2并将密码限制为kRSA,则它会正确协商到TLS1.0。它只对OpenSSL> = 1.0.2默认的ClientHello失败,后者使用比以前版本大得多的密码列表,因为TLS1.2为新的AEAD格式和新的PRF方案添加了一大堆新的密码套件。强制TLS1.0具有相同的效果,因为它导致OpenSSL仅提供在TLS1.0中有效的较小的密码组列表。我模糊地回忆起由'大'密码列表引发的XP时代的错误,这可能是这里的问题。
这不是证书。证书是他们唯一拥有的权利。
以上是关于Scrapy无法连接到仅支持旧版TLSv1的HTTPS站点。连接丢失的主要内容,如果未能解决你的问题,请参考以下文章
已协商获取 `TLSv1。尝试连接到 MS SQL 服务器时,请更新服务器和客户端以至少使用 TLSv1.2