深度学习系列15:tensorRT基础

Posted IE06

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了深度学习系列15:tensorRT基础相关的知识,希望对你有一定的参考价值。

tensorrt官方库:https://github.com/NVIDIA/TensorRT, git clone一下即可

1. onnx转tensorRT

首先放一张对比图:
在这里插入图片描述
使用官方的tensorrt包,找到里面的trtexec文件,按照下面的语句执行:

trtexec --onnx=onnx-modifier/result.onnx --batch=1 --saveEngine=onnx-modifier/result.trt --workspace=8196

即可获得tensorRT的trt模型。

2. 使用onnx进行推理

import onnxruntime as rt
import numpy as np
import cv2
sess = rt.InferenceSession("/....onnx")
input_name = sess.get_inputs()[0].name
img = cv2.resize(cv2.imread("....png"),(672,384)).astype(np.float32)
X = np.array([np.transpose(img, (2,0,1))])
pred_onnx = sess.run(None, {input_name: X})

3. 使用tensorRT进行推理

import tensorrt as trt
import os
import pycuda.driver as cuda
import cv2
import numpy as np
import pycuda.autoinit
class TensorRTInference(object):
    def __init__(self, engine_file_path, input_shape):
        self.engine_file_path = engine_file_path
        self.shape = input_shape       
        self.engine = self.load_engine()

    def load_engine(self):
        assert os.path.exists(self.engine_file_path)
        with open(self.engine_file_path, 'rb') as f, trt.Runtime(trt.Logger()) as runtime:
            engine_data = f.read()
            engine = runtime.deserialize_cuda_engine(engine_data)
            return engine

    def infer_once(self, img):
        engine = self.engine
        if len(img.shape) == 4:
            b, c, h, w = img.shape
        elif len(img.shape) == 3:
            c, h, w = img.shape
            b = 1
        with engine.create_execution_context() as context:
            context.set_binding_shape(engine.get_binding_index('input'), (b, c, h,w))
            bindings = []
            for binding in engine:
                binding_idx = engine.get_binding_index(binding)
                size = trt.volume(context.get_binding_shape(binding_idx))
                dtype = trt.nptype(engine.get_binding_dtype(binding))
                if engine.binding_is_input(binding):
                    input_buffer = np.ascontiguousarray(img, dtype).astype(np.float32)
                    input_memory = cuda.mem_alloc(img.nbytes)
                    bindings.append(int(input_memory))
                else:
                    output_buffer = cuda.pagelocked_empty(size, dtype)
                    bindings.append(int(output_memory))
            stream = cuda.Stream()
            cuda.memcpy_htod_async(input_memory, input_buffer, stream)
            context.execute_async(bindings=bindings, stream_handle=stream.handle)
            cuda.memcpy_dtoh_async(output_buffer, output_memory, stream)
            stream.synchronize()
            #res = np.reshape(output_buffer, (2, h, w))
        return output_buffer
INPUT_SHAPE = (224, 224)
engine_file_path = '***.trt'
img_path = 's1.png'
img = cv2.resize(cv2.imread(img_path), INPUT_SHAPE) # hwc
img = np.transpose(img, (2,0,1)).astype(np.float32) # chw
trt_infer = TensorRTInference(engine_file_path, INPUT_SHAPE)
engine = trt_infer.load_engine()
trt_infer.infer_once(img)

以上是关于深度学习系列15:tensorRT基础的主要内容,如果未能解决你的问题,请参考以下文章

深度学习 模型加速 TensorRT篇输入输出部署流程

[人工智能-深度学习-15]:神经网络基础 - 常见优化器Optimizer及其算法 - 梯度下降法系列

TensorRT8 meets Python TensorRT快速入门介绍

TensorRT-介绍-使用-安装

NVIDA-TensorRT部署

一个简单的tensorRT mnist推理案例,模型采用代码构建