This is a collection of simplified TensorRT samples to get you started with TensorRT programming. Most of the samples are written in C++, and some are in Python to show the basics.
For C++ samples, please install TensorRT properly and modify the Makefile according to your setup, then run make. For Python samples, please install TensorRT with Python wheel, and install PyTorch and onnx_graphsurgeon with pip before running the scripts prefixed with "app".
This is a basic sample which shows how to build and run an engine with static-shaped input (which we'll call "static-shape engine" for short) and save the engine to disk.
This sample introduces a reusable class TrtLite
, which will be used throughout these samples. The class is concise (~300 lines of code) yet covers most functions of TensorRT and simplifies its programming.
This sample shows how to build and run an engine with dynamic-shaped input ("dynamic-shape engine" for short), and how to copy data and run the engine asynchronously. You may use Nsight Systems to see the timeline of GPU events.
It's important to overlap copying data, including copying from host memory to device memory for input and vice versa for output, and running the engine. This technique makes it possible to run inference tasks consecutively on GPU and maximizes the throughput. This technique can also be applied to other samples.
This sample shows how to load an engine from disk and run it.
It also shows how to refit an engine. To refit a FP16 engine efficiently you need to save the build logs to a file. See the inlined comments in the source file for details.
This sample shows 3 use cases:
- Build a static-shape engine and run it in int8.
- Build a dynamic-shape engine and run it in int8.
- Build a quantization-aware trained (QAT) networks and run it in int8.
This sample shows how to write a simple static-shape plugin. This plugin supports fp32/fp16/int8 precision. Please note that the plugin interface IPluginV2IOExt is for and only for static shape.
This sample is similar to AppPlugin but in dynamic shape. Please be noted that the plugin interface IPluginV2DynamicExt is for and only for dynamic shape.
This sample shows how to create multiple contexts from the same engine thus saving device memory space for network weights, and how to run the engine contexts on their own stream.
Like AppDynamicShape, it also runs asynchronously and you may use Nsight Systems to see the timeline of GPU events.
This sample loads an engine from disk and get a benchmark on how many QPS can be achieved for max throughput. By default the sample loads an engine created from a Python sample and the command line utility trtexec (see below).
This sample shows how to build an engine from an ONNX file with the ONNX parser. Please note trtexec, the command line utility shipped with the TensorRT offical release, also has this functionality.
This sample shows how to build, save and run static-shape and dynamic-shape engines. TensorRT can be programmed with C++ or Python; C++ program can load and run the engine saved by Python program and vice versa.
This sample shows how to export a PyTorch model into onnx. With the trtexec command in the script, you can convert onnx into TensorRT engine file by the utility trtexec (so the engine can be loaded and run such as by AppThroughput).
This sample shows how to export a PyTorch model containing unsupported operator into onnx, and how to modify the onnx with graph surgeon so it can be converted into TensorRT engine smoothly.