aiohttp：设置每秒最大请求数

Posted 2023-03-30

技术标签:

【中文标题】aiohttp：设置每秒最大请求数【英文标题】：aiohttp: set maximum number of requests per second 【发布时间】：2016-02-04 09:19:30 【问题描述】：

如何在客户端使用 aiohttp 设置每秒最大请求数（限制它们）？

【问题讨论】：

我编写了一个名为asyncio-throttle 的小模块，现在托管在GitHub 上。看看它的简单实现。请参阅quentin.pradet.me/blog/…，了解与 aiohttp 特定的 asyncio-throttle 不同的实现，它正确限制每秒请求数，而不仅仅是限制并发连接数。顺便说一句，在 asyncio-throttle 中使用 async with 是个好主意！ 【参考方案1】：

虽然对每秒的请求数没有严格限制，但请注意，从 v2.0 开始，当使用 ClientSession 时，aiohttp 会自动将同时连接数限制为 100 .

您可以通过创建自己的TCPConnector 并将其传递给ClientSession 来修改限制。例如，要创建一个限制为 50 个并发请求的客户端：

import aiohttp

connector = aiohttp.TCPConnector(limit=50)
client = aiohttp.ClientSession(connector=connector)

如果它更适合您的用例，还有一个 limit_per_host 参数（默认情况下关闭），您可以传递该参数以限制同时连接到同一“端点”的数量。根据文档：

limit_per_host (int) – 限制同时连接到同一端点。如果端点具有相等的(host, port, is_ssl) 三元组，则它们是相同的。

示例用法：

import aiohttp

connector = aiohttp.TCPConnector(limit_per_host=50)
client = aiohttp.ClientSession(connector=connector)

【讨论】：

@GaryvanderMerwe 是的。不过，（一致赞成）接受的答案也限制了并发请求的数量而不是速率，所以我不确定你为什么只对我的问题提出异议。考虑到这些功能中最常见的用例——为了避免客户端通过大量请求来完全破坏某些服务器——任何一种方法（限制最大连接数与限制速率）都可以正常工作。 asyncio.Semaphore(5) 与 aiohttp.TCPConnector(limit_per_host=5) 有何不同？它们可以互换吗？如何使用 TCPConnector 限制仅针对特定主机的请求？我很难看到你如何评价 - 根据原始问题使用此解决方案限制请求（限制请求数每秒）。例如，您可以拥有 5 个并行连接，但如果响应足够快，这不会阻止您每秒点击遥控器超过 5 次。 @pcko1 是的，你说得对，这并没有完全符合问题的要求 - （不幸的是，现在已删除）评论也提出了同样的观点由 GaryvanderMerwe 撰写，我在此线程中的第一条评论正在回复。不过，希望它足够接近，对某些人仍然有用！我已经编辑了答案，以在第一句话中强调这并不完全符合要求。【参考方案2】：

我在这里找到了一种可能的解决方案：http://compiletoi.net/fast-scraping-in-python-with-asyncio.html

同时做 3 个请求很酷，但是做 5000 个就不是很好了。如果您尝试同时执行太多请求，则连接可能会开始关闭，或者您甚至可能会被网站禁止访问。

为避免这种情况，您可以使用信号量。它是一种同步工具，可用于限制在某个时间点执行某些操作的协程数量。我们将在创建循环之前创建信号量，将我们希望允许的同时请求数作为参数传递：

sem = asyncio.Semaphore(5)

然后，我们只需替换：

page = yield from get(url, compress=True)

同样的事情，但受信号量保护：

with (yield from sem):
    page = yield from get(url, compress=True)

这将确保最多可以同时完成 5 个请求。

【讨论】：

答案在技术上是有效的。只是为将来参考答案的读者添加了一些 nit cmets。使用asyncio.BoundedSemaphore(5) 而不是Semaphore 以防止意外增加原始限制（***.com/a/48971158/6687477）也使用async with sem:。根据文档 Deprecated since version 3.7: Acquiring a lock using await lock or yield from lock and/or with statement (with await lock, with (yield from lock)) 已被弃用。改用带锁的异步 (docs.python.org/3/library/…) asyncio.Semaphore(5) 与 aiohttp.TCPConnector(limit_per_host=5) 有何不同？它们可以互换吗？【参考方案3】：

您可以为每个请求设置延迟或将 URL 分批分组并限制批次以满足所需的频率。

1。每个请求的延迟

使用asyncio.sleep强制脚本在请求之间等待

import asyncio
import aiohttp

delay_per_request = 0.5
urls = [
   # put some URLs here...
]

async def app():
    tasks = []
    for url in urls:
        tasks.append(asyncio.ensure_future(make_request(url)))
        await asyncio.sleep(delay_per_request)

    results = await asyncio.gather(*tasks)
    return results

async def make_request(url):
    print('$$$ making request')
    async with aiohttp.ClientSession() as sess:
        async with sess.get(url) as resp:
            status = resp.status
            text = await resp.text()
            print('### got page data')
            return url, status, text

这可以运行，例如results = asyncio.run(app()).

2。批量油门

使用上面的make_request，您可以像这样请求和限制批量 URL：

import asyncio
import aiohttp
import time

max_requests_per_second = 0.5
urls = [[
   # put a few URLs here...
],[
   # put a few more URLs here...
]]

async def app():
    results = []
    for i, batch in enumerate(urls):
        t_0 = time.time()
        print(f'batch i')
        tasks = [asyncio.ensure_future(make_request(url)) for url in batch]
        for t in tasks:
            d = await t
            results.append(d)
        t_1 = time.time()

        # Throttle requests
        batch_time = (t_1 - t_0)
        batch_size = len(batch)
        wait_time = (batch_size / max_requests_per_second) - batch_time
        if wait_time > 0:
            print(f'Too fast! Waiting wait_time seconds')
            time.sleep(wait_time)

    return results

同样，这可以使用asyncio.run(app()) 运行。

【讨论】：

每个请求的延迟不起作用。它只是延迟了任务中的收集，而不是提交到服务器时的实际请求。【参考方案4】：

这是一个没有aiohttp 的示例，但您可以使用Limit 装饰器包装任何异步方法或aiohttp.request

import asyncio
import time


class Limit(object):
    def __init__(self, calls=5, period=1):
        self.calls = calls
        self.period = period
        self.clock = time.monotonic
        self.last_reset = 0
        self.num_calls = 0

    def __call__(self, func):
        async def wrapper(*args, **kwargs):
            if self.num_calls >= self.calls:
                await asyncio.sleep(self.__period_remaining())

            period_remaining = self.__period_remaining()

            if period_remaining <= 0:
                self.num_calls = 0
                self.last_reset = self.clock()

            self.num_calls += 1

            return await func(*args, **kwargs)

        return wrapper

    def __period_remaining(self):
        elapsed = self.clock() - self.last_reset
        return self.period - elapsed


@Limit(calls=5, period=2)
async def test_call(x):
    print(x)


async def worker():
    for x in range(100):
        await test_call(x + 1)


asyncio.run(worker())

【讨论】：

以上是关于aiohttp：设置每秒最大请求数的主要内容，如果未能解决你的问题，请参考以下文章