TensorRT FB

Posted 2022-12-13 WhateverYoung

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了TensorRT FB相关的知识，希望对你有一定的参考价值。

TensorRT

NVidia公司提供的前向框架（API形式提供），主要是C++接口。可以充分发挥GPU的计算能力，提高吞吐和时延。

模型转换：
- API自定义模型，要熟悉训练使用框架的格式以及TRT中layer对于权重格式的要求，把训练的权重转化并赋值给TRT layer
- CaffeParser，从Caffe模型导出
- UffParser，从uff格式模型导出
- OnnxParser，从ONNX格式模型导出

download and install

按照install文档安装即可，可能遇到下述问题：
https://codeyarns.com/2015/07/31/pip-install-error-with-pycuda/
https://blog.csdn.net/xll_bit/article/details/78376320

developer guide

https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html

kinds of supported layers
key concepts: NewWork Definition, Tensor, Layer, DataType, Builder, RunTime
PlanFile， Engine, ExcuteCtx
int8 低精度高性能计算
FP16 低精度高性能计算
优化方法主要是图的冗余优化，fusion，低精度以及根据具体参数下的kernel的穷举最优选择

WorkFlow

Train->fix batch_size and Precision define network-> Optimize using TensorRT->planfile(can be serialized to disk)->validate and inference

build phrase and excute phrase

IBuilder->createNetwork
INetworkDefination->addInput addLayers markOutput
IBuilder->buildCudaEngine(INetworkDefination)
ICudaEngine->createCtx() ->serialize()
ICtx->binding buffers to tensors inputs and outputs
IExecuteContext->enqueue or excute

IRuntime->deserialize()
ICudaEngine->createCtx()
ICtx->binding buffers to tensors inputs and outputs

第一步，准备好网络所需要的所有权重参数；第二步，使用builder定义网络，此时tensor只有维度信息，类似tf的计算图，定义好计算规则，所有tensor由name map映射到最终运行时需要的真实内存buffer，在excute enqueue之前，都只是定义网络，此时可以持久化定义好的网络到磁盘；第三步，申请内存，绑定所有的inputs outpus buffer到tensor，使用name和index进行绑定，然后同步或者异步执行，得到计算结果输出。

samples

sampleMNIST,CaffeParser
sampleUFFMNIST,UffParser
sampleMNISTAPI, API build network
sampleGoogleNet, caffeParser and FP16 and profile
sampleCharRNN, APi build network, weights load from files
sampleINT8, weight convert, how to use int8
samplePlugIN, CafferParser + PlugIn
sampleFasterRCNN, TODO
sampleONNX, ONNXParser
sampleMLP, TODO

sampleNMT

ReadMe.txt helps a lot.
Support uni encoder and uni decoder with luong attention.
bi-encoder and other attentions like scaled_luong attention are not supported.

First, train with TensorFlow-nmt, using chpt_to_bin.py convert weights from TensorFlow to TensorRT’s bin file and parse by the sample c++ codes.
Then, run inference by TensorRT.
You can modify sampleNMT.cpp’s global vars for other vocs.

params: beam size, batch size…

TODO

How to use PlugIn ….
CaffeParser/UFFParser/ONNXParer …
RaggedSoftmax Scale RNNV2 layer …
sampleNMT beam search algs …
each samples run and learn …

reference

https://docs.nvidia.com/deeplearning/sdk/index.html#deep-learning-sdk
https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/topics/index.html
https://developer.nvidia.com/tensorrt
https://developer.nvidia.com/nvidia-tensorrt-4x-download

以上是关于TensorRT FB的主要内容，如果未能解决你的问题，请参考以下文章