triton server报The engine plan file is generated on an incompatible device

Posted 修炼之路

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了triton server报The engine plan file is generated on an incompatible device相关的知识,希望对你有一定的参考价值。

错误信息

在启动triton inference server的时候报

I0701 02:42:42.028366 1 cuda_memory_manager.cc:103] CUDA memory pool is created on device 0 with size 67108864
I0701 02:42:42.031240 1 model_repository_manager.cc:1065] loading: resnet152:1
E0701 02:43:00.935893 1 logging.cc:43] INVALID_CONFIG: The engine plan file is generated on an incompatible device, expecting compute 7.5 got compute 8.6, please rebuild.
E0701 02:43:00.935952 1 logging.cc:43] engine.cpp (1646) - Serialization Error in deserialize: 0 (Core engine deserialization failure)
E0701 02:43:00.993150 1 logging.cc:43] INVALID_STATE: std::exception
E0701 02:43:00.993215 1 logging.cc:43] INVALID_CONFIG: Deserialize the cuda engine failed.
E0701 02:43:01.002146 1 model_repository_manager.cc:1242] failed to load 'resnet152' version 1: Internal: unable to create TensorRT engine
I0701 02:43:01.002473 1 server.cc:570] 
+-----------+---------+---------------------------------------------------------+
| Model     | Version | Status                                                  |
+-----------+---------+---------------------------------------------------------+
| resnet152 | 1       | UNAVAILABLE: Internal: unable to create TensorRT engine |
+-----------+---------+---------------------------------------------------------+
I0701 02:43:01.002665 1 server.cc:233] Waiting for in-flight requests to complete.
I0701 02:43:01.002678 1 server.cc:248] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

解决办法

The engine plan file is generated on an incompatible device不难看出是由于incompatible device导致的。

检查再将onnx转换为model.plan时的显卡型号是否和启动server时显卡型号一样。如果你是在RTX 3090上转换的,启动的时候却使用的是RTX 2070就会导致这个问题。解决办法就行,使用trtexec在对应的显卡上重新生成model.plan即可。

以上是关于triton server报The engine plan file is generated on an incompatible device的主要内容,如果未能解决你的问题,请参考以下文章

triton-inference-server启动报Invalid argument: unexpected inference

triton-inference-server启动报Internal - failed to load all models

基于Triton Server部署BERT模型

基于Triton Server部署BERT模型

深度学习部署架构:以 Triton Inference Server(TensorRT)为例

ERROR: Timeout on the Spark engine during the broadcast join