python requests.get超时的完成响应

Posted 2021-04-10

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python requests.get超时的完成响应相关的知识，希望对你有一定的参考价值。

我正在收集网站列表的统计数据，为了简单起见，我正在使用它的请求。这是我的代码：

data=[]
websites=['http://google.com', 'http://bbc.co.uk']
for w in websites:
    r= requests.get(w, verify=False)
    data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) )

现在，我希望requests.get在10秒后超时，这样循环就不会卡住。

这个问题也引起了人们的兴趣before，但没有一个答案是干净的。我将在此给予一些赏金以获得一个很好的答案。

我听说也许不使用请求是一个好主意，但那么我应该如何获得请求提供的好东西。（元组中的那些）

答案

使用eventlet怎么样？如果您希望在10秒后超时请求，即使正在接收数据，此代码段也适用于您：

import requests
import eventlet
eventlet.monkey_patch()

with eventlet.Timeout(10):
    requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip", verify=False)

另一答案

这段代码适用于socketError 11004和10060 ......

# -*- encoding:UTF-8 -*-
__author__ = 'ACE'
import requests
from PyQt4.QtCore import *
from PyQt4.QtGui import *


class TimeOutModel(QThread):
    Existed = pyqtSignal(bool)
    TimeOut = pyqtSignal()

    def __init__(self, fun, timeout=500, parent=None):
        """
        @param fun: function or lambda
        @param timeout: ms
        """
        super(TimeOutModel, self).__init__(parent)
        self.fun = fun

        self.timeer = QTimer(self)
        self.timeer.setInterval(timeout)
        self.timeer.timeout.connect(self.time_timeout)
        self.Existed.connect(self.timeer.stop)
        self.timeer.start()

        self.setTerminationEnabled(True)

    def time_timeout(self):
        self.timeer.stop()
        self.TimeOut.emit()
        self.quit()
        self.terminate()

    def run(self):
        self.fun()


bb = lambda: requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip")

a = QApplication([])

z = TimeOutModel(bb, 500)
print 'timeout'

a.exec_()

另一答案

尽管存在关于请求的问题，但我发现使用pycurl CURLOPT_TIMEOUT或CURLOPT_TIMEOUT_MS非常容易。

无需线程或信号：

import pycurl
import StringIO

url = 'http://www.example.com/example.zip'
timeout_ms = 1000
raw = StringIO.StringIO()
c = pycurl.Curl()
c.setopt(pycurl.TIMEOUT_MS, timeout_ms)  # total timeout in milliseconds
c.setopt(pycurl.WRITEFUNCTION, raw.write)
c.setopt(pycurl.NOSIGNAL, 1)
c.setopt(pycurl.URL, url)
c.setopt(pycurl.HTTPGET, 1)
try:
    c.perform()
except pycurl.error:
    traceback.print_exc() # error generated on timeout
    pass # or just pass if you don't want to print the error

另一答案

只是另一个解决方案（从http://docs.python-requests.org/en/master/user/advanced/#streaming-uploads得到它）

在上传之前，您可以找到内容大小：

TOO_LONG = 10*1024*1024  # 10 Mb
big_url = "http://ipv4.download.thinkbroadband.com/1GB.zip"
r = requests.get(big_url, stream=True)
print (r.headers['content-length'])
# 1073741824  

if int(r.headers['content-length']) < TOO_LONG:
    # upload content:
    content = r.content

但要小心，发件人可以在“内容长度”响应字段中设置不正确的值。

另一答案

如果是这样，创建一个看门狗线程，在10秒后混淆请求的内部状态，例如：

关闭底层套接字，理想情况下
如果请求重试操作，则触发异常

请注意，根据系统库，您可能无法设置DNS解析的截止日期。

另一答案

好吧，我在这个页面上尝试了很多解决方案，但仍面临不稳定性，随机挂起，连接性能差。

我现在正在使用Curl，我真的很高兴它的“最大时间”功能和全球性能，即使是如此糟糕的实现：

content=commands.getoutput('curl -m6 -Ss "http://mywebsite.xyz"')

在这里，我定义了一个6秒的最大时间参数，同时兼顾连接和传输时间。

我确定Curl有一个很好的python绑定，如果你更喜欢坚持pythonic语法:)

另一答案

设置stream=True并使用r.iter_content(1024)。是的，eventlet.Timeout不知何故对我不起作用。

try:
    start = time()
    timeout = 5
    with get(config['source']['online'], stream=True, timeout=timeout) as r:
        r.raise_for_status()
        content = bytes()
        content_gen = r.iter_content(1024)
        while True:
            if time()-start > timeout:
                raise TimeoutError('Time out! ({} seconds)'.format(timeout))
            try:
                content += next(content_gen)
            except StopIteration:
                break
        data = content.decode().split('
')
        if len(data) in [0, 1]:
            raise ValueError('Bad requests data')
except (exceptions.RequestException, ValueError, IndexError, KeyboardInterrupt,
        TimeoutError) as e:
    print(e)
    with open(config['source']['local']) as f:
        data = [line.strip() for line in f.readlines()]

讨论在这里https://redd.it/80kp1h

另一答案

如果你使用stream=True选项，你可以这样做：

r = requests.get(
    'http://url_to_large_file',
    timeout=1,  # relevant only for underlying socket
    stream=True)

with open('/tmp/out_file.txt'), 'wb') as f:
    start_time = time.time()
    for chunk in r.iter_content(chunk_size=1024):
        if chunk:  # filter out keep-alive new chunks
            f.write(chunk)
        if time.time() - start_time > 8:
            raise Exception('Request took longer than 8s')

该解决方案不需要信号或多处理。

另一答案

有一个名为timeout-decorator的包，你可以使用它来超时任何python函数。

@timeout_decorator.timeout(5)
def mytest():
    print("Start")
    for i in range(1,10):
        time.sleep(1)
        print("{} seconds have passed".format(i))

它使用信号方法，这里的一些答案建议。或者，您可以告诉它使用多处理而不是信号（例如，如果您处于多线程环境中）。

另一答案

我想出了一个更直接的解决方案，这个解决方案虽然难看，但却解决了真正的问题。它有点像这样：

resp = requests.get(some_url, stream=True)
resp.raw._fp.fp._sock.settimeout(read_timeout)
# This will load the entire response even though stream is set
content = resp.content

你可以阅读here的完整解释

另一答案

设置timeout parameter：

r = requests.get(w, verify=False, timeout=10)

只要您没有在该请求上设置stream=True，如果连接超过十秒，或者服务器发送的数据超过十秒，这将导致对requests.get()的调用超时。

另一答案

更新：http://docs.python-requests.org/en/master/user/advanced/#timeouts

在新版本的requests：

如果为超时指定单个值，则如下所示：

r = requests.get('https://github.com', timeout=5)

超时值将应用于connect和read超时。如果要单独设置值，请指定元组：

r = requests.get('https://github.com', timeout=(3.05, 27))

如果远程服务器非常慢，您可以通过传递None作为超时值然后检索一杯咖啡来告诉请求永远等待响应。

r = requests.get('https://github.com', timeout=None)

我的旧（可能是过时的）答案（很久以前发布）：

还有其他方法可以解决这个问题：

1.使用TimeoutSauce内部课程

来自：https://github.com/kennethreitz/requests/issues/1928#issuecomment-35811896

import requests from requests.adapters import TimeoutSauce

class MyTimeout(TimeoutSauce):
    def __init__(self, *args, **kwargs):
        connect = kwargs.get('connect', 5)
        read = kwargs.get('read', connect)
        super(MyTimeout, self).__init__(connect=connect, read=read)

requests.adapters.TimeoutSauce = MyTimeout
此代码应该使我们将读取超时设置为等于连接超时，这是您在Session.get（）调用时传递的超时值。（注意，我实际上没有测试过这段代码，因此可能需要一些快速调试，我只是将它直接写入GitHub窗口。）

2.使用来自kevinburke的请求：https://github.com/kevinburke/requests/tree/connect-timeout

从其文档：