triton-inference-server启动报Internal - failed to load all models

Posted 修炼之路

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了triton-inference-server启动报Internal - failed to load all models相关的知识,希望对你有一定的参考价值。

错误信息

  • 启动tritonserver
docker run --gpus=1 --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /full_path/deploy/models/:/models nvcr.io/nvidia/tritonserver:21.03-py3 tritonserver --model-repository=/models

在启动tritonserver的时候报Internal - failed to load all models错误,错误信息如下

+-----------+---------+----------------------------------------------------------------------------------------------------+
| Model     | Version | Status                                                                                             |
+-----------+---------+----------------------------------------------------------------------------------------------------+
| resnet152 | 1       | UNAVAILABLE: Internal - failed to load all models features |
+-----------+---------+----------------------------------------------------------------------------------------------------+
I0420 16:14:07.481496 1 server.cc:280] Waiting for in-flight requests to complete.
I0420 16:14:07.481506 1 model_repository_manager.cc:435] LiveBackendStates()
I0420 16:14:07.481512 1 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

错误原因分析

导致这种错误的原因通常都是由于TensorRT的版本不一致导致的,这里的版本不一致指的是我们再将模型由(onnx)转换为tensorrt的模型时TensorRT的版本与docker镜像tritonserver里面的TensorRT版本不一致导致的,所以解决这个问题也很简单,我们只需要使用与tritonserver里面版本一致的TensorRT将模型重新做一个转换即可解决问题

解决办法

  • 进入到镜像中
docker run --gpus all -it --rm -v /full_path/deploy/models/:/models nvcr.io/nvidia/tensorrt:21.03-py3
#进入到tensorrt的安装目录,里面有一个trtexec的可执行文件
#trition-server就是依赖这个来加载模型的
cd /workspace/tensorrt/bin

-v参数的目的是做一个目录的映射,这样我们就不想要去拷贝模型文件

  • 测试TensorRT是否能够成功加载模型
trtexec --loadEngine=resnet152.engine
#输出信息
[06/25/2021-22:28:38] [I] Host Latency
[06/25/2021-22:28:38] [I] min: 3.96118 ms (end to end 3.97363 ms)
[06/25/2021-22:28:38] [I] max: 4.36243 ms (end to end 8.4928 ms)
[06/25/2021-22:28:38] [I] mean: 4.05112 ms (end to end 7.76932 ms)
[06/25/2021-22:28:38] [I] median: 4.02783 ms (end to end 7.79443 ms)
[06/25/2021-22:28:38] [I] percentile: 4.35217 ms at 99% (end to end 8.46191 ms at 99%)
[06/25/2021-22:28:38] [I] throughput: 250.151 qps
[06/25/2021-22:28:38] [I] walltime: 1.75494 s
[06/25/2021-22:28:38] [I] Enqueue Time
[06/25/2021-22:28:38] [I] min: 2.37549 ms
[06/25/2021-22:28:38] [I] max: 3.47607 ms
[06/25/2021-22:28:38] [I] median: 2.49707 ms
[06/25/2021-22:28:38] [I] GPU Compute
[06/25/2021-22:28:38] [I] min: 3.90149 ms
[06/25/2021-22:28:38] [I] max: 4.29773 ms
[06/25/2021-22:28:38] [I] mean: 3.98691 ms
[06/25/2021-22:28:38] [I] median: 3.96387 ms
[06/25/2021-22:28:38] [I] percentile: 4.28748 ms at 99%
[06/25/2021-22:28:38] [I] total compute time: 1.75025 s
&&&& PASSED TensorRT.trtexec

如果最后输出了PASSED说明模型加载成功,下面来看一个加载失败的案例

[06/26/2021-22:09:27] [I] === Device Information ===
[06/26/2021-22:09:27] [I] Selected Device: GeForce RTX 3090
[06/26/2021-22:09:27] [I] Compute Capability: 8.6
[06/26/2021-22:09:27] [I] SMs: 82
[06/26/2021-22:09:27] [I] Compute Clock Rate: 1.725 GHz
[06/26/2021-22:09:27] [I] Device Global Memory: 24265 MiB
[06/26/2021-22:09:27] [I] Shared Memory per SM: 100 KiB
[06/26/2021-22:09:27] [I] Memory Bus Width: 384 bits (ECC disabled)
[06/26/2021-22:09:27] [I] Memory Clock Rate: 9.751 GHz
[06/26/2021-22:09:27] [I] 
[06/26/2021-22:09:27] [I] TensorRT version: 8000
[06/26/2021-22:09:28] [I] [TRT] [MemUsageChange] Init CUDA: CPU +443, GPU +0, now: CPU 449, GPU 551 (MiB)
[06/26/2021-22:09:28] [I] [TRT] Loaded engine size: 222 MB
[06/26/2021-22:09:28] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 449 MiB, GPU 551 MiB
[06/26/2021-22:09:28] [E] Error[1]: [stdArchiveReader.cpp::StdArchiveReader::34] Error Code 1: Serialization (Version tag does not match. Note: Current Version: 43, Serialized Engine Version: 96)
[06/26/2021-22:09:28] [E] Error[4]: [runtime.cpp::deserializeCudaEngine::74] Error Code 4: Internal Error (Engine deserialization failed.)
[06/26/2021-22:09:28] [E] Engine creation failed
[06/26/2021-22:09:28] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8000]
#或者出现这种
[06/25/2021-19:08:23] [I] Memory Clock Rate: 9.751 GHz
[06/25/2021-19:08:23] [I] 
[06/25/2021-19:08:25] [E] [TRT] INVALID_CONFIG: The engine plan file is not compatible with this version of TensorRT, expecting library version 7.2.3 got 7.2.2, please rebuild.
[06/25/2021-19:08:25] [E] [TRT] engine.cpp (1646) - Serialization Error in deserialize: 0 (Core engine deserialization failure)
[06/25/2021-19:08:25] [E] [TRT] INVALID_STATE: std::exception
[06/25/2021-19:08:25] [E] [TRT] INVALID_CONFIG: Deserialize the cuda engine failed.
[06/25/2021-19:08:25] [E] Engine creation failed
[06/25/2021-19:08:25] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec

上面的错误信息就是一个典型的TensorRT版本不匹配导致的问题,解决这种问题的方法一共有两种,第一种就是使用匹配的TensorRT版本重新导出模型的engine文件第二种就是修改tritonserver的版本以匹配engine模型文件中使用的TensorRT的版本

第一种方法

去pull与tritonserver版本一致的tensorrt版本,例如

#pull tritonserver镜像
docker pull nvcr.io/nvidia/tritonserver:21.03-py3
#pull tensorrt镜像
docker pull nvcr.io/nvidia/tensorrt:21.03-py3

pull完成之后,通过对应版本的tensorrt镜像来对模型重新进行转换即可

第二种方法

可以去NVIDIA镜像网站上去pull与TensorRT版本一致的tritonserver即可,tritonserver各个版本:tritonserver镜像列表

以上是关于triton-inference-server启动报Internal - failed to load all models的主要内容,如果未能解决你的问题,请参考以下文章

triton-inference-server报Error details: model expected the shape of dimension 0 to be between

VirtualBox启动模式分析(正常启动/无界面启动/分离式启动)

android 性能优化 -- 启动过程 冷启动 热启动

windows的服务启动类型的延迟启动要系统开机后多少秒才自动启动?

uefi启动中删除多余的启动项

windowsxp软件添加启动项怎么延迟启动?