Fastapi python代码执行速度受uvicorn vs gunicorn部署的影响
Posted
技术标签:
【中文标题】Fastapi python代码执行速度受uvicorn vs gunicorn部署的影响【英文标题】:Fastapi python code execution speed impacted by deployment with uvicorn vs gunicorn 【发布时间】:2021-08-17 09:08:00 【问题描述】:我写了一个 fastapi 应用程序。现在我正在考虑部署它,但是我似乎遇到了奇怪的意外性能问题,这似乎取决于我使用 uvicorn 还是 gunicorn。特别是如果我使用 gunicorn,所有代码(甚至标准库纯 python 代码)似乎都会变慢。为了进行性能调试,我编写了一个小应用程序来演示这一点:
import asyncio, time
from fastapi import FastAPI, Path
from datetime import datetime
app = FastAPI()
@app.get("/delay/delay1/delay2")
async def get_delay(
delay1: float = Path(..., title="Nonblocking time taken to respond"),
delay2: float = Path(..., title="Blocking time taken to respond"),
):
total_start_time = datetime.now()
times = []
for i in range(100):
start_time = datetime.now()
await asyncio.sleep(delay1)
time.sleep(delay2)
times.append(str(datetime.now()-start_time))
return "delays":[delay1,delay2],"total_time_taken":str(datetime.now()-total_start_time),"times":times
运行 fastapi 应用程序:
gunicorn api.performance_test:app -b localhost:8001 -k uvicorn.workers.UvicornWorker --workers 1
到达http://localhost:8001/delay/0.0/0.0
的共鸣体始终是这样的:
"delays": [
0.0,
0.0
],
"total_time_taken": "0:00:00.057946",
"times": [
"0:00:00.000323",
...smilar values omitted for brevity...
"0:00:00.000274"
]
但是使用:
uvicorn api.performance_test:app --port 8001
我经常得到这样的时间
"delays": [
0.0,
0.0
],
"total_time_taken": "0:00:00.002630",
"times": [
"0:00:00.000037",
...snip...
"0:00:00.000020"
]
当我取消注释 await asyncio.sleep(delay1)
语句时,差异变得更加明显。
所以我想知道 gunicorn/uvicorn 对 python/fastapi 运行时做了什么来使代码执行速度产生 10 倍的差异。
我在 OS X 11.2.3 和英特尔 I7 处理器上使用 Python 3.8.2 执行了这些测试。
这些是我的pip freeze
输出的相关部分
fastapi==0.65.1
gunicorn==20.1.0
uvicorn==0.13.4
【问题讨论】:
【参考方案1】:我无法重现您的结果。
我的环境: Windows 10 上 WSL2 上的 ubuntu
我的pip freeze
输出的相关部分:
fastapi==0.65.1
gunicorn==20.1.0
uvicorn==0.14.0
我稍微修改了代码:
import asyncio, time
from fastapi import FastAPI, Path
from datetime import datetime
import statistics
app = FastAPI()
@app.get("/delay/delay1/delay2")
async def get_delay(
delay1: float = Path(..., title="Nonblocking time taken to respond"),
delay2: float = Path(..., title="Blocking time taken to respond"),
):
total_start_time = datetime.now()
times = []
for i in range(100):
start_time = datetime.now()
await asyncio.sleep(delay1)
time.sleep(delay2)
time_delta= (datetime.now()-start_time).microseconds
times.append(time_delta)
times_average = statistics.mean(times)
return "delays":[delay1,delay2],"total_time_taken":(datetime.now()-total_start_time).microseconds,"times_avarage":times_average,"times":times
除了第一次加载网站外,两种方法的结果几乎相同。
这两种方法的大部分时间都在0:00:00.000530
和0:00:00.000620
之间。
每个的第一次尝试需要更长的时间:大约0:00:00.003000
。
但是,在我重新启动 Windows 并再次尝试这些测试后,我注意到服务器启动后首次请求的时间不再增加(我认为这要归功于重新启动后有大量可用 RAM)
非首次运行示例(3 次尝试):
# `uvicorn performance_test:app --port 8083`
"delays":[0.0,0.0],"total_time_taken":553,"times_avarage":4.4,"times":[15,7,5,4,4,4,4,5,5,4,4,5,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,5,4,4,5,4,4,5,4,4,4,4,4,5,4,5,5,4,4,4,4,4,4,5,4,4,4,5,4,4,4,4,4,4,5,4,4,5,4,4,4,4,5,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,4,4,4,4,5,4,4,4,5,4]
"delays":[0.0,0.0],"total_time_taken":575,"times_avarage":4.61,"times":[15,6,5,5,5,5,5,5,5,5,5,4,5,5,5,5,4,4,4,4,4,5,5,5,4,5,4,4,4,5,5,5,4,5,5,4,4,4,4,5,5,5,5,4,4,4,4,5,5,4,4,4,4,4,4,4,4,5,5,4,4,4,4,5,5,5,5,5,5,5,4,4,4,4,5,5,4,5,5,4,4,4,4,4,4,5,5,5,4,4,4,4,5,5,5,5,4,4,4,4]
"delays":[0.0,0.0],"total_time_taken":548,"times_avarage":4.31,"times":[14,6,5,4,4,4,4,4,4,4,5,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,5,4,4,4,5,4,4,4,4,4,4,4,4,5,4,4,4,4,4,4,5,4,4,4,4,4,5,5,4,4,4,4,4,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,4,4,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,4]
# `gunicorn performance_test:app -b localhost:8084 -k uvicorn.workers.UvicornWorker --workers 1`
"delays":[0.0,0.0],"total_time_taken":551,"times_avarage":4.34,"times":[13,6,5,5,5,5,5,4,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,4,4,4,5,4,4,4,4,4,5,4,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,4,5,4,4,4,4,4,4,4,5,4,4,4,4,4,4,4,4,4,5,4,4,5,4,5,4,4,5,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,5,4,4,5]
"delays":[0.0,0.0],"total_time_taken":558,"times_avarage":4.48,"times":[14,7,5,5,5,5,5,5,4,4,4,4,4,4,5,5,4,4,4,4,5,4,4,4,5,5,4,4,4,5,5,4,4,4,5,4,4,4,5,5,4,4,4,4,5,5,4,4,5,5,4,4,5,5,4,4,4,5,4,4,5,4,4,5,5,4,4,4,5,4,4,4,5,4,4,4,5,4,5,4,4,4,5,4,4,4,5,4,4,4,5,4,4,4,5,4,4,4,5,4]
"delays":[0.0,0.0],"total_time_taken":550,"times_avarage":4.34,"times":[15,6,5,4,4,4,4,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,4,4,4,5,4,4,4,4,5,5,4,4,4,4,5,4,4,4,4,4,5,4,4,5,4,4,5,4,4,5,4,4,5,4,4,5,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4,4,4,4,5,4,4,5,4,4,4,4,4]
带有注释await asyncio.sleep(delay1)
的非首次运行示例(3 次尝试):
# `uvicorn performance_test:app --port 8083`
"delays":[0.0,0.0],"total_time_taken":159,"times_avarage":0.6,"times":[3,1,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,0,0,1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,0]
"delays":[0.0,0.0],"total_time_taken":162,"times_avarage":0.49,"times":[3,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,0,1,0,0,0,0,1,1,1,1,1,0,0,0,0,1,1,1,1,0,0,1,0,0,0,0,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,0,0,0,0,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1,1,1,0,0,0,0,1,1]
"delays":[0.0,0.0],"total_time_taken":156,"times_avarage":0.61,"times":[3,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,1,0,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1]
# `gunicorn performance_test:app -b localhost:8084 -k uvicorn.workers.UvicornWorker --workers 1`
"delays":[0.0,0.0],"total_time_taken":159,"times_avarage":0.59,"times":[2,0,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,0,0,0,1,0,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,0,0,0,1,1,1,1,1,0,0]
"delays":[0.0,0.0],"total_time_taken":165,"times_avarage":0.62,"times":[3,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,1,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1]
"delays":[0.0,0.0],"total_time_taken":164,"times_avarage":0.54,"times":[2,0,0,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,0,0,0,1,1,1,1,1]
我制作了一个 Python 脚本来更精确地对这些时间进行基准测试:
import statistics
import requests
from time import sleep
number_of_tests=1000
sites_to_test=[
'name':'only uvicorn ',
'url':'http://127.0.0.1:8083/delay/0.0/0.0'
,
'name':'gunicorn+uvicorn',
'url':'http://127.0.0.1:8084/delay/0.0/0.0'
]
for test in sites_to_test:
total_time_taken_list=[]
times_avarage_list=[]
requests.get(test['url']) # first request may be slower, so better to not measure it
for a in range(number_of_tests):
r = requests.get(test['url'])
json= r.json()
total_time_taken_list.append(json['total_time_taken'])
times_avarage_list.append(json['times_avarage'])
# sleep(1) # results are slightly different with sleep between requests
total_time_taken_avarage=statistics.mean(total_time_taken_list)
times_avarage_avarage=statistics.mean(times_avarage_list)
print('name':test['name'], 'number_of_tests':number_of_tests, 'total_time_taken_avarage':total_time_taken_avarage, 'times_avarage_avarage':times_avarage_avarage)
结果:
'name': 'only uvicorn ', 'number_of_tests': 2000, 'total_time_taken_avarage': 586.5985, 'times_avarage_avarage': 4.820865
'name': 'gunicorn+uvicorn', 'number_of_tests': 2000, 'total_time_taken_avarage': 571.8415, 'times_avarage_avarage': 4.719035
带有注释await asyncio.sleep(delay1)
的结果
'name': 'only uvicorn ', 'number_of_tests': 2000, 'total_time_taken_avarage': 151.301, 'times_avarage_avarage': 0.602495
'name': 'gunicorn+uvicorn', 'number_of_tests': 2000, 'total_time_taken_avarage': 144.4655, 'times_avarage_avarage': 0.59196
我还制作了上述脚本的另一个版本,它每 1 个请求更改 url(它给出的时间稍长):
import statistics
import requests
from time import sleep
number_of_tests=1000
sites_to_test=[
'name':'only uvicorn ',
'url':'http://127.0.0.1:8083/delay/0.0/0.0',
'total_time_taken_list':[],
'times_avarage_list':[]
,
'name':'gunicorn+uvicorn',
'url':'http://127.0.0.1:8084/delay/0.0/0.0',
'total_time_taken_list':[],
'times_avarage_list':[]
]
for test in sites_to_test:
requests.get(test['url']) # first request may be slower, so better to not measure it
for a in range(number_of_tests):
for test in sites_to_test:
r = requests.get(test['url'])
json= r.json()
test['total_time_taken_list'].append(json['total_time_taken'])
test['times_avarage_list'].append(json['times_avarage'])
# sleep(1) # results are slightly different with sleep between requests
for test in sites_to_test:
total_time_taken_avarage=statistics.mean(test['total_time_taken_list'])
times_avarage_avarage=statistics.mean(test['times_avarage_list'])
print('name':test['name'], 'number_of_tests':number_of_tests, 'total_time_taken_avarage':total_time_taken_avarage, 'times_avarage_avarage':times_avarage_avarage)
结果:
'name': 'only uvicorn ', 'number_of_tests': 2000, 'total_time_taken_avarage': 589.4315, 'times_avarage_avarage': 4.789385
'name': 'gunicorn+uvicorn', 'number_of_tests': 2000, 'total_time_taken_avarage': 589.0915, 'times_avarage_avarage': 4.761095
带有注释await asyncio.sleep(delay1)
的结果
'name': 'only uvicorn ', 'number_of_tests': 2000, 'total_time_taken_avarage': 152.8365, 'times_avarage_avarage': 0.59173
'name': 'gunicorn+uvicorn', 'number_of_tests': 2000, 'total_time_taken_avarage': 154.4525, 'times_avarage_avarage': 0.59768
这个答案应该可以帮助您更好地调试结果。
如果您分享有关您的操作系统/机器的更多详细信息,我认为这可能有助于调查您的结果。
另外请重启你的电脑/服务器,可能会有影响。
更新 1:
我发现我使用了更新版本的 uvicorn 0.14.0
,而不是问题 0.13.4
中所述的版本。
我还用旧版本0.13.4
进行了测试,但结果相似,我仍然无法重现您的结果。
更新 2:
我运行了一些基准测试,发现了一些有趣的事情:
在 requirements.txt 中使用 uvloop:
整个 requirements.txt:
uvicorn==0.14.0
fastapi==0.65.1
gunicorn==20.1.0
uvloop==0.15.2
结果:
'name': 'only uvicorn ', 'number_of_tests': 500, 'total_time_taken_avarage': 362.038, 'times_avarage_avarage': 2.54142
'name': 'gunicorn+uvicorn', 'number_of_tests': 500, 'total_time_taken_avarage': 366.814, 'times_avarage_avarage': 2.56766
在 requirements.txt 中没有 uvloop:
整个 requirements.txt:
uvicorn==0.14.0
fastapi==0.65.1
gunicorn==20.1.0
结果:
'name': 'only uvicorn ', 'number_of_tests': 500, 'total_time_taken_avarage': 595.578, 'times_avarage_avarage': 4.83828
'name': 'gunicorn+uvicorn', 'number_of_tests': 500, 'total_time_taken_avarage': 584.64, 'times_avarage_avarage': 4.7155
更新 3:
我在这个答案中只使用了Python 3.9.5
。
【讨论】:
感谢您的广泛测试!我的操作系统/机器已经隐藏在我长长的问题中的某个地方。我在 OS X 11.2.3 和英特尔 I7 处理器上使用 Python 3.8.2 执行了这些测试。我会看看我是否也可以在普通的 Ubuntu 机器上运行一些测试。还要感谢您指出仅安装 uvloop 即可显着提升性能! @M.D.好吧,我错过了。我在这个答案中只使用了 Python 3.9.5,所以它也与你的版本不同。我的 CPU 是锐龙 3700x。【参考方案2】:差异在于您使用的底层网络服务器。
类比可以是:two cars, same brand, same options, just a different engine, what's the difference?
Web 服务器与汽车不完全一样,但我想你明白我要表达的意思。
基本上,gunicorn
是synchronous
网络服务器,而uvicorn
是asynchronous
网络服务器。由于您使用的是fastapi
和await
关键字,我想您已经知道asyncio
/asynchornous programming
是什么。
我不知道代码的差异,所以我的回答要持保留态度,但uvicorn
的性能更高,因为asynchronous
部分。我对时间差异的猜测是,如果您使用async
web 服务器,它已经在启动时配置为处理async
功能,而如果您使用sync
web 服务器,它不是并且有某种开销以抽象该部分。
这不是一个正确的答案,但它会提示您可能存在的差异。
【讨论】:
感谢您的回复。感谢您给我一些背景信息。如果我在函数调用之外进行计时,例如在外部压力测试工具中,我会得到时间差异的来源。然而,所有的计时代码都在 get_delay 代码中。即使我将 get_delay 函数的主体放在一个单独的同步函数中(当然没有 asyncio.sleep 因为它现在位于等待非法的函数中)并且只有async def get_delay(delay1, delay2): return sync_function_call(delay1, delay2)
,我也会得到类似的时间差异。
因此,由于某种原因,似乎在 guvicorn 下运行时所有的 cpu 绑定 python 代码都会变慢。导入的 python 包中的 cpu 绑定代码也是如此。我能想到的唯一解释是,也许 gunicorn 正在安装一些钩子,这些钩子是 git 由纯 python 代码执行中的一些非常常见的事件触发的。
这是两个针对不同事物进行优化的引擎。 gunicorn
是用synchronous
代码创建的,而uvicorn
是用asynchronous
代码创建的。此外,uvicorn
暴露uvloop
的事件循环而不是内置的asyncio
事件循环的可能性很小,前者比后者快得多。虽然,我不确定这一点,但基准测试提供了很好的结果github.com/MagicStack/uvloop
我的建议是你不要太在意表演,除非它们是你项目的硬性约束。如果ASGI
服务器可用,请使用其中一个(这是有道理的,因为您使用的是ASGI
框架),否则使用WGSI
,如gunicorn
。前者针对在fastapi
中运行asynchronous
函数进行了优化,后者则不是【参考方案3】:
由于fastapi
是一个ASGI
框架,因此它可以通过ASGI
服务器(如uvicorn
或hypercorn
)提供更好的性能。 WSGI
像 gunicorn
这样的服务器无法提供像 uvicorn
这样的性能。 ASGI
服务器针对 asynchronous
功能进行了优化。 fastapi
的官方文档也鼓励使用ASGI
服务器,如uvicorn
或hypercorn
。
https://fastapi.tiangolo.com/#installation
【讨论】:
考虑到gunicorn
可以与uvicorn
一起使用以利用多个内核/CPU
gunicorn
可用于服务 asgi,是服务uvicorn
的推荐方式之一。 uvicorn.org/deployment/#gunicorn以上是关于Fastapi python代码执行速度受uvicorn vs gunicorn部署的影响的主要内容,如果未能解决你的问题,请参考以下文章