/PIDNet_TensorRT

TensorRT implementation for PIDNet

Primary LanguagePythonMIT LicenseMIT

PIDNet_TensorRT

This repository provides a step-by-step guide and code for optimizing a state-of-the-art semantic segmentation model using TorchScript, ONNX, and TensorRT.

Prerequisites

Device: RTX 3050

  • CUDA: 12.0 (driver: 525)
  • cuDNN: 8.9
  • TensorRT: 8.6
  • PyCUDA

Device: NVIDIA Jetson Nano

  • Jetpack: 4.6.2
  • PyCUDA

Usage

0. Setup

  • Clone this repository and download the pretrained model from the official PIDNet repository.

1. Export the model

For TorchScript:

python tools/export.py --a pidnet-s --p ./pretrained_models/cityscapes/PIDNet_S_Cityscapes_test.pt --f torchscript

For ONNX:

python tools/export.py --a pidnet-s --p ./pretrained_models/cityscapes/PIDNet_S_Cityscapes_test.pt --f onnx

For TensorRT (using the above ONNX model):

trtexec --onnx=path/to/onnx/model --saveEngine=path/to/engine 

2. Inference

python tools/inference.py --f pytorch

3. Speed Measurement

  • Measure the inference speed of PIDNet-S for Cityscapes:
python models/speed/pidnet_speed.py --f all
FPS % increase
PyTorch 24.72 -
TorchScript 27.09 9.59
ONNX (with TensorRT EP) 33.52 35.60
TensorRT 32.93 33.21

speed test is performed on a single Nvidia GeForce RTX 3050 GPU

Acknowledgement

  1. PIDNet