/TensorRT_Inference_Demo

A repo that uses TensorRT to deploy wll-trained models.Support RTDETR,YOLO-NAS,YOLOV5,YOLOV6,YOLOV7,YOLOV8,YOLOX.

Primary LanguageC++MIT LicenseMIT

TensorRT_Inference_Demo

Cuda

1.Introduction

This repo use TensorRT-8.x to deploy well-trained models, both image preprocessing and postprocessing are performed with CUDA, which realizes high-speed inference.

2.Update

update process
  • 2023.05.01 🚀 Create the repo.
  • 2023.05.03 🚀 Support yolov5 detection.
  • 2023.05.05 🚀 Support yolov7 and yolov5 instance-segmentation.
  • 2023.05.10 🚀 Support yolov8 detection and instance-segmentation.
  • 2023.05.12 🚀 Support cuda preprocess for speed up.
  • 2023.05.16 🚀 Support cuda box postprocess.
  • 2023.05.19 🚀 Support cuda mask postprocess and support rtdetr.
  • 2023.05.21 🚀 Support yolov6.
  • 2023.05.26 🚀 Support dynamic batch inference.
  • 2023.06.07 🚀 Support yolox and yolo-nas.

3.Support Models

supported models

All speed tests were performed on RTX 3090 with COCO Val set.The time calculated here is the sum of the time of image loading, preprocess, inference and postprocess, so it's going to be slower than what's reported in the paper.

Models BatchSize Mode Resolution FPS
YOLOv5-s v7.0 1 FP32 640x640 200
YOLOv5-s v7.0 32 FP32 640x640 246
YOLOv5-seg-s v7.0 1 FP32 640x640 155
YOLOv6-s v3 1 FP32 640x640 163
YOLOv7 1 FP32 640x640 107
YOLOv8-s 1 FP32 640x640 171
YOLOv8-seg-s 1 FP32 640x640 122
YOLOX-s 1 FP32 640x640 156
YOLO-NAS-s 1 FP32 640x640 165
RT-DETR 1 FP32 640x640 106

4.Usage

  1. Clone the repo.
git clone https://github.com/Li-Hongda/TensorRT_Inference_Demo.git
  1. Install the dependencies.

TensorRT

Following NVIDIA offical docs to install TensorRT.

yaml-cpp

git clone https://github.com/jbeder/yaml-cpp
mkdir build && cd build
cmake ..
make -j20
cmake -DYAML_BUILD_SHARED_LIBS=on ..
make -j20
cd ..
  1. Change the path here to your TensorRT path, and here to your CUDA path. Then,
cd TensorRT_Inference_Demo/object_detection
mkdir build && cd build
cmake ..
make -j$(nproc)
  1. Get the ONNX model from the official repository and put them in weights/MODEL_NAME. Then modify the configuration file in configs.Take yolov5 as an example:
python export.py --weights=yolov5s.pt  --dynamic --simplify --include=onnx --opset 11
  1. The executable file will be generated in bin in the repo directory if compile successfully.Then enjoy yourself with command like this:
cd bin
./object_detection yolov5 /path/to/input/dir 

Notes:

  1. The output of the model is required for post-processing is num_bboxes (imageHeight x imageWidth) x num_pred(num_cls + coordinates + confidence),while the output of YOLOv8 is num_pred x num_bboxes,which means the predicted values of the same box are not contiguous in memory.For convenience, the corresponding dimensions of the original pytorch output need to be transposed when exporting to ONNX model.
  2. The dynamic shape engine is convenient but sacrifices some inference speed compared with the static model of the same batchsize.Therefore, if you want to pursue faster inference speed, it is better to export the ONNX model of fixed batchsize, such as batchsize 32.

5.Results

Bilibili Demo:

6.Reference

[0].https://github.com/NVIDIA/TensorRT
[1].https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#c_topics
[2].https://github.com/linghu8812/tensorrt_inference
[3].https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#
[4].https://blog.csdn.net/bobchen1017?type=blog