triton-inference-server启动报Internal - failed to load all models
Posted 修炼之路
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了triton-inference-server启动报Internal - failed to load all models相关的知识,希望对你有一定的参考价值。
错误信息
- 启动tritonserver
docker run --gpus=1 --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /full_path/deploy/models/:/models nvcr.io/nvidia/tritonserver:21.03-py3 tritonserver --model-repository=/models
在启动tritonserver
的时候报Internal - failed to load all models错误,错误信息如下
+-----------+---------+----------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+-----------+---------+----------------------------------------------------------------------------------------------------+
| resnet152 | 1 | UNAVAILABLE: Internal - failed to load all models features |
+-----------+---------+----------------------------------------------------------------------------------------------------+
I0420 16:14:07.481496 1 server.cc:280] Waiting for in-flight requests to complete.
I0420 16:14:07.481506 1 model_repository_manager.cc:435] LiveBackendStates()
I0420 16:14:07.481512 1 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
错误原因分析
导致这种错误的原因通常都是由于TensorRT
的版本不一致导致的,这里的版本不一致指的是我们再将模型由(onnx)转换为tensorrt的模型时TensorRT的版本与docker镜像tritonserver里面的TensorRT版本不一致导致的
,所以解决这个问题也很简单,我们只需要使用与tritonserver
里面版本一致的TensorRT将模型重新做一个转换即可解决问题
解决办法
- 进入到镜像中
docker run --gpus all -it --rm -v /full_path/deploy/models/:/models nvcr.io/nvidia/tensorrt:21.03-py3
#进入到tensorrt的安装目录,里面有一个trtexec的可执行文件
#trition-server就是依赖这个来加载模型的
cd /workspace/tensorrt/bin
-v
参数的目的是做一个目录的映射,这样我们就不想要去拷贝模型文件
- 测试TensorRT是否能够成功加载模型
trtexec --loadEngine=resnet152.engine
#输出信息
[06/25/2021-22:28:38] [I] Host Latency
[06/25/2021-22:28:38] [I] min: 3.96118 ms (end to end 3.97363 ms)
[06/25/2021-22:28:38] [I] max: 4.36243 ms (end to end 8.4928 ms)
[06/25/2021-22:28:38] [I] mean: 4.05112 ms (end to end 7.76932 ms)
[06/25/2021-22:28:38] [I] median: 4.02783 ms (end to end 7.79443 ms)
[06/25/2021-22:28:38] [I] percentile: 4.35217 ms at 99% (end to end 8.46191 ms at 99%)
[06/25/2021-22:28:38] [I] throughput: 250.151 qps
[06/25/2021-22:28:38] [I] walltime: 1.75494 s
[06/25/2021-22:28:38] [I] Enqueue Time
[06/25/2021-22:28:38] [I] min: 2.37549 ms
[06/25/2021-22:28:38] [I] max: 3.47607 ms
[06/25/2021-22:28:38] [I] median: 2.49707 ms
[06/25/2021-22:28:38] [I] GPU Compute
[06/25/2021-22:28:38] [I] min: 3.90149 ms
[06/25/2021-22:28:38] [I] max: 4.29773 ms
[06/25/2021-22:28:38] [I] mean: 3.98691 ms
[06/25/2021-22:28:38] [I] median: 3.96387 ms
[06/25/2021-22:28:38] [I] percentile: 4.28748 ms at 99%
[06/25/2021-22:28:38] [I] total compute time: 1.75025 s
&&&& PASSED TensorRT.trtexec
如果最后输出了PASSED说明模型加载成功,下面来看一个加载失败的案例
[06/26/2021-22:09:27] [I] === Device Information ===
[06/26/2021-22:09:27] [I] Selected Device: GeForce RTX 3090
[06/26/2021-22:09:27] [I] Compute Capability: 8.6
[06/26/2021-22:09:27] [I] SMs: 82
[06/26/2021-22:09:27] [I] Compute Clock Rate: 1.725 GHz
[06/26/2021-22:09:27] [I] Device Global Memory: 24265 MiB
[06/26/2021-22:09:27] [I] Shared Memory per SM: 100 KiB
[06/26/2021-22:09:27] [I] Memory Bus Width: 384 bits (ECC disabled)
[06/26/2021-22:09:27] [I] Memory Clock Rate: 9.751 GHz
[06/26/2021-22:09:27] [I]
[06/26/2021-22:09:27] [I] TensorRT version: 8000
[06/26/2021-22:09:28] [I] [TRT] [MemUsageChange] Init CUDA: CPU +443, GPU +0, now: CPU 449, GPU 551 (MiB)
[06/26/2021-22:09:28] [I] [TRT] Loaded engine size: 222 MB
[06/26/2021-22:09:28] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 449 MiB, GPU 551 MiB
[06/26/2021-22:09:28] [E] Error[1]: [stdArchiveReader.cpp::StdArchiveReader::34] Error Code 1: Serialization (Version tag does not match. Note: Current Version: 43, Serialized Engine Version: 96)
[06/26/2021-22:09:28] [E] Error[4]: [runtime.cpp::deserializeCudaEngine::74] Error Code 4: Internal Error (Engine deserialization failed.)
[06/26/2021-22:09:28] [E] Engine creation failed
[06/26/2021-22:09:28] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8000]
#或者出现这种
[06/25/2021-19:08:23] [I] Memory Clock Rate: 9.751 GHz
[06/25/2021-19:08:23] [I]
[06/25/2021-19:08:25] [E] [TRT] INVALID_CONFIG: The engine plan file is not compatible with this version of TensorRT, expecting library version 7.2.3 got 7.2.2, please rebuild.
[06/25/2021-19:08:25] [E] [TRT] engine.cpp (1646) - Serialization Error in deserialize: 0 (Core engine deserialization failure)
[06/25/2021-19:08:25] [E] [TRT] INVALID_STATE: std::exception
[06/25/2021-19:08:25] [E] [TRT] INVALID_CONFIG: Deserialize the cuda engine failed.
[06/25/2021-19:08:25] [E] Engine creation failed
[06/25/2021-19:08:25] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec
上面的错误信息就是一个典型的TensorRT
版本不匹配导致的问题,解决这种问题的方法一共有两种,第一种就是使用匹配的TensorRT版本重新导出模型的engine文件
,第二种就是修改tritonserver的版本以匹配engine模型文件中使用的TensorRT的版本
第一种方法
去pull与tritonserver
版本一致的tensorrt
版本,例如
#pull tritonserver镜像
docker pull nvcr.io/nvidia/tritonserver:21.03-py3
#pull tensorrt镜像
docker pull nvcr.io/nvidia/tensorrt:21.03-py3
pull完成之后,通过对应版本的tensorrt镜像来对模型重新进行转换即可
第二种方法
可以去NVIDIA
镜像网站上去pull与TensorRT版本一致的tritonserver即可,tritonserver各个版本:tritonserver镜像列表
以上是关于triton-inference-server启动报Internal - failed to load all models的主要内容,如果未能解决你的问题,请参考以下文章
triton-inference-server报Error details: model expected the shape of dimension 0 to be between
VirtualBox启动模式分析(正常启动/无界面启动/分离式启动)