Fastapi python代码执行速度受uvicorn vs gunicorn部署的影响

Posted

技术标签:

【中文标题】Fastapi python代码执行速度受uvicorn vs gunicorn部署的影响【英文标题】:Fastapi python code execution speed impacted by deployment with uvicorn vs gunicorn 【发布时间】:2021-08-17 09:08:00 【问题描述】:

我写了一个 fastapi 应用程序。现在我正在考虑部署它,但是我似乎遇到了奇怪的意外性能问题,这似乎取决于我使用 uvicorn 还是 gunicorn。特别是如果我使用 gunicorn,所有代码(甚至标准库纯 python 代码)似乎都会变慢。为了进行性能调试,我编写了一个小应用程序来演示这一点:

import asyncio, time
from fastapi import FastAPI, Path
from datetime import datetime

app = FastAPI()

@app.get("/delay/delay1/delay2")
async def get_delay(
    delay1: float = Path(..., title="Nonblocking time taken to respond"),
    delay2: float = Path(..., title="Blocking time taken to respond"),
):
    total_start_time = datetime.now()
    times = []
    for i in range(100):
        start_time = datetime.now()
        await asyncio.sleep(delay1)
        time.sleep(delay2)
        times.append(str(datetime.now()-start_time))
    return "delays":[delay1,delay2],"total_time_taken":str(datetime.now()-total_start_time),"times":times

运行 fastapi 应用程序:

gunicorn api.performance_test:app -b localhost:8001 -k uvicorn.workers.UvicornWorker --workers 1

到达http://localhost:8001/delay/0.0/0.0 的共鸣体始终是这样的:


  "delays": [
    0.0,
    0.0
  ],
  "total_time_taken": "0:00:00.057946",
  "times": [
    "0:00:00.000323",
    ...smilar values omitted for brevity...
    "0:00:00.000274"
  ]

但是使用:

uvicorn api.performance_test:app --port 8001 

我经常得到这样的时间


  "delays": [
    0.0,
    0.0
  ],
  "total_time_taken": "0:00:00.002630",
  "times": [
    "0:00:00.000037",
    ...snip...
    "0:00:00.000020"
  ]

当我取消注释 await asyncio.sleep(delay1) 语句时,差异变得更加明显。

所以我想知道 gunicorn/uvicorn 对 python/fastapi 运行时做了什么来使代码执行速度产生 10 倍的差异。

我在 OS X 11.2.3 和英特尔 I7 处理器上使用 Python 3.8.2 执行了这些测试。

这些是我的pip freeze 输出的相关部分

fastapi==0.65.1
gunicorn==20.1.0
uvicorn==0.13.4

【问题讨论】:

【参考方案1】:

我无法重现您的结果。

我的环境: Windows 10 上 WSL2 上的 ubuntu

我的pip freeze 输出的相关部分:

fastapi==0.65.1
gunicorn==20.1.0
uvicorn==0.14.0

我稍微修改了代码:

import asyncio, time
from fastapi import FastAPI, Path
from datetime import datetime
import statistics

app = FastAPI()

@app.get("/delay/delay1/delay2")
async def get_delay(
    delay1: float = Path(..., title="Nonblocking time taken to respond"),
    delay2: float = Path(..., title="Blocking time taken to respond"),
):
    total_start_time = datetime.now()
    times = []
    for i in range(100):
        start_time = datetime.now()
        await asyncio.sleep(delay1)
        time.sleep(delay2)
        time_delta= (datetime.now()-start_time).microseconds
        times.append(time_delta)

    times_average = statistics.mean(times)

    return "delays":[delay1,delay2],"total_time_taken":(datetime.now()-total_start_time).microseconds,"times_avarage":times_average,"times":times

除了第一次加载网站外,两种方法的结果几乎相同。

这两种方法的大部分时间都在0:00:00.0005300:00:00.000620 之间。

每个的第一次尝试需要更长的时间:大约0:00:00.003000。 但是,在我重新启动 Windows 并再次尝试这些测试后,我注意到服务器启动后首次请求的时间不再增加(我认为这要归功于重新启动后有大量可用 RAM)


非首次运行示例(3 次尝试):

# `uvicorn performance_test:app --port 8083`

"delays":[0.0,0.0],"total_time_taken":553,"times_avarage":4.4,"times":[15,7,5,4,4,4,4,5,5,4,4,5,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,5,4,4,5,4,4,5,4,4,4,4,4,5,4,5,5,4,4,4,4,4,4,5,4,4,4,5,4,4,4,4,4,4,5,4,4,5,4,4,4,4,5,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,4,4,4,4,5,4,4,4,5,4]
"delays":[0.0,0.0],"total_time_taken":575,"times_avarage":4.61,"times":[15,6,5,5,5,5,5,5,5,5,5,4,5,5,5,5,4,4,4,4,4,5,5,5,4,5,4,4,4,5,5,5,4,5,5,4,4,4,4,5,5,5,5,4,4,4,4,5,5,4,4,4,4,4,4,4,4,5,5,4,4,4,4,5,5,5,5,5,5,5,4,4,4,4,5,5,4,5,5,4,4,4,4,4,4,5,5,5,4,4,4,4,5,5,5,5,4,4,4,4]
"delays":[0.0,0.0],"total_time_taken":548,"times_avarage":4.31,"times":[14,6,5,4,4,4,4,4,4,4,5,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,5,4,4,4,5,4,4,4,4,4,4,4,4,5,4,4,4,4,4,4,5,4,4,4,4,4,5,5,4,4,4,4,4,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,4,4,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,4]


# `gunicorn performance_test:app -b localhost:8084 -k uvicorn.workers.UvicornWorker --workers 1`

"delays":[0.0,0.0],"total_time_taken":551,"times_avarage":4.34,"times":[13,6,5,5,5,5,5,4,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,4,4,4,5,4,4,4,4,4,5,4,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,4,5,4,4,4,4,4,4,4,5,4,4,4,4,4,4,4,4,4,5,4,4,5,4,5,4,4,5,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,5,4,4,5]
"delays":[0.0,0.0],"total_time_taken":558,"times_avarage":4.48,"times":[14,7,5,5,5,5,5,5,4,4,4,4,4,4,5,5,4,4,4,4,5,4,4,4,5,5,4,4,4,5,5,4,4,4,5,4,4,4,5,5,4,4,4,4,5,5,4,4,5,5,4,4,5,5,4,4,4,5,4,4,5,4,4,5,5,4,4,4,5,4,4,4,5,4,4,4,5,4,5,4,4,4,5,4,4,4,5,4,4,4,5,4,4,4,5,4,4,4,5,4]
"delays":[0.0,0.0],"total_time_taken":550,"times_avarage":4.34,"times":[15,6,5,4,4,4,4,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,4,4,4,5,4,4,4,4,5,5,4,4,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4]

带有注释await asyncio.sleep(delay1) 的非首次运行示例(3 次尝试):

# `uvicorn performance_test:app --port 8083`

"delays":[0.0,0.0],"total_time_taken":159,"times_avarage":0.6,"times":[3,1,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,0,0,1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,0]
"delays":[0.0,0.0],"total_time_taken":162,"times_avarage":0.49,"times":[3,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,0,1,0,0,0,0,1,1,1,1,1,0,0,0,0,1,1,1,1,0,0,1,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1]
"delays":[0.0,0.0],"total_time_taken":156,"times_avarage":0.61,"times":[3,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,1,0,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1]


# `gunicorn performance_test:app -b localhost:8084 -k uvicorn.workers.UvicornWorker --workers 1`

"delays":[0.0,0.0],"total_time_taken":159,"times_avarage":0.59,"times":[2,0,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,0,0,0,1,0,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,0]
"delays":[0.0,0.0],"total_time_taken":165,"times_avarage":0.62,"times":[3,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,1,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1]
"delays":[0.0,0.0],"total_time_taken":164,"times_avarage":0.54,"times":[2,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1]

我制作了一个 Python 脚本来更精确地对这些时间进行基准测试:

import statistics
import requests
from time import sleep

number_of_tests=1000

sites_to_test=[
    
        'name':'only uvicorn    ',
        'url':'http://127.0.0.1:8083/delay/0.0/0.0'
    ,
    
        'name':'gunicorn+uvicorn',
        'url':'http://127.0.0.1:8084/delay/0.0/0.0'
    ]


for test in sites_to_test:

    total_time_taken_list=[]
    times_avarage_list=[]

    requests.get(test['url']) # first request may be slower, so better to not measure it

    for a in range(number_of_tests):
        r = requests.get(test['url'])
        json= r.json()

        total_time_taken_list.append(json['total_time_taken'])
        times_avarage_list.append(json['times_avarage'])
        # sleep(1) # results are slightly different with sleep between requests

    total_time_taken_avarage=statistics.mean(total_time_taken_list)
    times_avarage_avarage=statistics.mean(times_avarage_list)

    print('name':test['name'], 'number_of_tests':number_of_tests, 'total_time_taken_avarage':total_time_taken_avarage, 'times_avarage_avarage':times_avarage_avarage)

结果:

'name': 'only uvicorn    ', 'number_of_tests': 2000, 'total_time_taken_avarage': 586.5985, 'times_avarage_avarage': 4.820865
'name': 'gunicorn+uvicorn', 'number_of_tests': 2000, 'total_time_taken_avarage': 571.8415, 'times_avarage_avarage': 4.719035

带有注释await asyncio.sleep(delay1)的结果

'name': 'only uvicorn    ', 'number_of_tests': 2000, 'total_time_taken_avarage': 151.301, 'times_avarage_avarage': 0.602495
'name': 'gunicorn+uvicorn', 'number_of_tests': 2000, 'total_time_taken_avarage': 144.4655, 'times_avarage_avarage': 0.59196

我还制作了上述脚本的另一个版本,它每 1 个请求更改 url(它给出的时间稍长):

import statistics
import requests
from time import sleep

number_of_tests=1000

sites_to_test=[
    
        'name':'only uvicorn    ',
        'url':'http://127.0.0.1:8083/delay/0.0/0.0',
        'total_time_taken_list':[],
        'times_avarage_list':[]
    ,
    
        'name':'gunicorn+uvicorn',
        'url':'http://127.0.0.1:8084/delay/0.0/0.0',
        'total_time_taken_list':[],
        'times_avarage_list':[]
    ]


for test in sites_to_test:
    requests.get(test['url']) # first request may be slower, so better to not measure it

for a in range(number_of_tests):

    for test in sites_to_test:
        r = requests.get(test['url'])
        json= r.json()

        test['total_time_taken_list'].append(json['total_time_taken'])
        test['times_avarage_list'].append(json['times_avarage'])
        # sleep(1) # results are slightly different with sleep between requests


for test in sites_to_test:
    total_time_taken_avarage=statistics.mean(test['total_time_taken_list'])
    times_avarage_avarage=statistics.mean(test['times_avarage_list'])

    print('name':test['name'], 'number_of_tests':number_of_tests, 'total_time_taken_avarage':total_time_taken_avarage, 'times_avarage_avarage':times_avarage_avarage)

结果:

'name': 'only uvicorn    ', 'number_of_tests': 2000, 'total_time_taken_avarage': 589.4315, 'times_avarage_avarage': 4.789385
'name': 'gunicorn+uvicorn', 'number_of_tests': 2000, 'total_time_taken_avarage': 589.0915, 'times_avarage_avarage': 4.761095

带有注释await asyncio.sleep(delay1)的结果

'name': 'only uvicorn    ', 'number_of_tests': 2000, 'total_time_taken_avarage': 152.8365, 'times_avarage_avarage': 0.59173
'name': 'gunicorn+uvicorn', 'number_of_tests': 2000, 'total_time_taken_avarage': 154.4525, 'times_avarage_avarage': 0.59768

这个答案应该可以帮助您更好地调试结果。

如果您分享有关您的操作系统/机器的更多详细信息,我认为这可能有助于调查您的结果。

另外请重启你的电脑/服务器,可能会有影响。


更新 1:

我发现我使用了更新版本的 uvicorn 0.14.0,而不是问题 0.13.4 中所述的版本。 我还用旧版本0.13.4 进行了测试,但结果相似,我仍然无法重现您的结果。


更新 2:

我运行了一些基准测试,发现了一些有趣的事情:

在 requirements.txt 中使用 uvloop:

整个 requirements.txt:

uvicorn==0.14.0
fastapi==0.65.1
gunicorn==20.1.0
uvloop==0.15.2

结果:

'name': 'only uvicorn    ', 'number_of_tests': 500, 'total_time_taken_avarage': 362.038, 'times_avarage_avarage': 2.54142
'name': 'gunicorn+uvicorn', 'number_of_tests': 500, 'total_time_taken_avarage': 366.814, 'times_avarage_avarage': 2.56766

在 requirements.txt 中没有 uvloop:

整个 requirements.txt:

uvicorn==0.14.0
fastapi==0.65.1
gunicorn==20.1.0

结果:

'name': 'only uvicorn    ', 'number_of_tests': 500, 'total_time_taken_avarage': 595.578, 'times_avarage_avarage': 4.83828
'name': 'gunicorn+uvicorn', 'number_of_tests': 500, 'total_time_taken_avarage': 584.64, 'times_avarage_avarage': 4.7155

更新 3:

我在这个答案中只使用了Python 3.9.5

【讨论】:

感谢您的广泛测试!我的操作系统/机器已经隐藏在我长长的问题中的某个地方。我在 OS X 11.2.3 和英特尔 I7 处理器上使用 Python 3.8.2 执行了这些测试。我会看看我是否也可以在普通的 Ubuntu 机器上运行一些测试。还要感谢您指出仅安装 uvloop 即可显着提升性能! @M.D.好吧,我错过了。我在这个答案中只使用了 Python 3.9.5,所以它也与你的版本不同。我的 CPU 是锐龙 3700x。【参考方案2】:

差异在于您使用的底层网络服务器。

类比可以是:two cars, same brand, same options, just a different engine, what's the difference?

Web 服务器与汽车不完全一样,但我想你明白我要表达的意思。

基本上,gunicornsynchronous 网络服务器,而uvicornasynchronous 网络服务器。由于您使用的是fastapiawait 关键字,我想您已经知道asyncio/asynchornous programming 是什么。

我不知道代码的差异,所以我的回答要持保留态度,但uvicorn 的性能更高,因为asynchronous 部分。我对时间差异的猜测是,如果您使用async web 服务器,它已经在启动时配置为处理async 功能,而如果您使用sync web 服务器,它不是并且有某种开销以抽象该部分。

这不是一个正确的答案,但它会提示您可能存在的差异。

【讨论】:

感谢您的回复。感谢您给我一些背景信息。如果我在函数调用之外进行计时,例如在外部压力测试工具中,我会得到时间差异的来源。然而,所有的计时代码都在 get_delay 代码中。即使我将 get_delay 函数的主体放在一个单独的同步函数中(当然没有 asyncio.sleep 因为它现在位于等待非法的函数中)并且只有async def get_delay(delay1, delay2): return sync_function_call(delay1, delay2),我也会得到类似的时间差异。 因此,由于某种原因,似乎在 guvicorn 下运行时所有的 cpu 绑定 python 代码都会变慢。导入的 python 包中的 cpu 绑定代码也是如此。我能想到的唯一解释是,也许 gunicorn 正在安装一些钩子,这些钩子是 git 由纯 python 代码执行中的一些非常常见的事件触发的。 这是两个针对不同事物进行优化的引擎。 gunicorn 是用synchronous 代码创建的,而uvicorn 是用asynchronous 代码创建的。此外,uvicorn 暴露uvloop 的事件循环而不是内置的asyncio 事件循环的可能性很小,前者比后者快得多。虽然,我不确定这一点,但基准测试提供了很好的结果github.com/MagicStack/uvloop 我的建议是你不要太在意表演,除非它们是你项目的硬性约束。如果ASGI 服务器可用,请使用其中一个(这是有道理的,因为您使用的是ASGI 框架),否则使用WGSI,如gunicorn。前者针对在fastapi 中运行asynchronous 函数进行了优化,后者则不是【参考方案3】:

由于fastapi 是一个ASGI 框架,因此它可以通过ASGI 服务器(如uvicornhypercorn)提供更好的性能。 WSGIgunicorn 这样的服务器无法提供像 uvicorn 这样的性能。 ASGI 服务器针对 asynchronous 功能进行了优化。 fastapi的官方文档也鼓励使用ASGI服务器,如uvicornhypercorn

https://fastapi.tiangolo.com/#installation

【讨论】:

考虑到gunicorn 可以与uvicorn 一起使用以利用多个内核/CPU gunicorn 可用于服务 asgi,是服务uvicorn 的推荐方式之一。 uvicorn.org/deployment/#gunicorn

以上是关于Fastapi python代码执行速度受uvicorn vs gunicorn部署的影响的主要内容,如果未能解决你的问题,请参考以下文章

fastapi nodejs 性能比较

GitHub:FastAPI是一种现代,快速(高性能)的Web框架

如何评价最近爆红的FastAPI?

只需几行代码,Python 执行速度就可以提高了30倍!

17.FastAPI 表单数据

无法在 Spyder 中执行 FastAPI 客户端代码