TensorRT FB
Posted WhateverYoung
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了TensorRT FB相关的知识,希望对你有一定的参考价值。
TensorRT
NVidia公司提供的前向框架(API形式提供),主要是C++接口。可以充分发挥GPU的计算能力,提高吞吐和时延。
模型转换:
- API自定义模型,要熟悉训练使用框架的格式以及TRT中layer对于权重格式的要求,把训练的权重转化并赋值给TRT layer
- CaffeParser,从Caffe模型导出
- UffParser,从uff格式模型导出
- OnnxParser,从ONNX格式模型导出
download and install
按照install文档安装即可,可能遇到下述问题:
https://codeyarns.com/2015/07/31/pip-install-error-with-pycuda/
https://blog.csdn.net/xll_bit/article/details/78376320
developer guide
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html
- kinds of supported layers
- key concepts: NewWork Definition, Tensor, Layer, DataType, Builder, RunTime
PlanFile, Engine, ExcuteCtx - int8 低精度高性能计算
- FP16 低精度高性能计算
- 优化方法主要是图的冗余优化,fusion,低精度以及根据具体参数下的kernel的穷举最优选择
WorkFlow
Train->fix batch_size and Precision define network-> Optimize using TensorRT->planfile(can be serialized to disk)->validate and inference
build phrase and excute phrase
IBuilder->createNetwork
INetworkDefination->addInput addLayers markOutput
IBuilder->buildCudaEngine(INetworkDefination)
ICudaEngine->createCtx() ->serialize()
ICtx->binding buffers to tensors inputs and outputs
IExecuteContext->enqueue or excute
IRuntime->deserialize()
ICudaEngine->createCtx()
ICtx->binding buffers to tensors inputs and outputs
第一步,准备好网络所需要的所有权重参数;第二步,使用builder定义网络,此时tensor只有维度信息,类似tf的计算图,定义好计算规则,所有tensor由name map映射到最终运行时需要的真实内存buffer,在excute enqueue之前,都只是定义网络,此时可以持久化定义好的网络到磁盘;第三步,申请内存,绑定所有的inputs outpus buffer到tensor,使用name和index进行绑定,然后同步或者异步执行,得到计算结果输出。
samples
- sampleMNIST,CaffeParser
- sampleUFFMNIST,UffParser
- sampleMNISTAPI, API build network
- sampleGoogleNet, caffeParser and FP16 and profile
- sampleCharRNN, APi build network, weights load from files
- sampleINT8, weight convert, how to use int8
- samplePlugIN, CafferParser + PlugIn
- sampleFasterRCNN, TODO
- sampleONNX, ONNXParser
- sampleMLP, TODO
sampleNMT
- ReadMe.txt helps a lot.
- Support uni encoder and uni decoder with luong attention.
- bi-encoder and other attentions like scaled_luong attention are not supported.
First, train with TensorFlow-nmt, using chpt_to_bin.py convert weights from TensorFlow to TensorRT’s bin file and parse by the sample c++ codes.
Then, run inference by TensorRT.
You can modify sampleNMT.cpp’s global vars for other vocs.
- params: beam size, batch size…
TODO
- How to use PlugIn ….
- CaffeParser/UFFParser/ONNXParer …
- RaggedSoftmax Scale RNNV2 layer …
- sampleNMT beam search algs …
- each samples run and learn …
reference
https://docs.nvidia.com/deeplearning/sdk/index.html#deep-learning-sdk
https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/topics/index.html
https://developer.nvidia.com/tensorrt
https://developer.nvidia.com/nvidia-tensorrt-4x-download
以上是关于TensorRT FB的主要内容,如果未能解决你的问题,请参考以下文章