PIDNet_TensorRT

This repository provides a step-by-step guide and code for optimizing a state-of-the-art semantic segmentation model using TorchScript, ONNX, and TensorRT.

Prerequisites

Device: RTX 3050

CUDA: 12.0 (driver: 525)
cuDNN: 8.9
TensorRT: 8.6
PyCUDA

Device: NVIDIA Jetson Nano

Jetpack: 4.6.2
PyCUDA

Usage

0. Setup

Clone this repository and download the pretrained model from the official PIDNet repository.

1. Export the model

For TorchScript:

python tools/export.py --a pidnet-s --p ./pretrained_models/cityscapes/PIDNet_S_Cityscapes_test.pt --f torchscript

For ONNX:

python tools/export.py --a pidnet-s --p ./pretrained_models/cityscapes/PIDNet_S_Cityscapes_test.pt --f onnx

For TensorRT (using the above ONNX model):

trtexec --onnx=path/to/onnx/model --saveEngine=path/to/engine

2. Inference

python tools/inference.py --f pytorch

3. Speed Measurement

Measure the inference speed of PIDNet-S for Cityscapes:

python models/speed/pidnet_speed.py --f all

	FPS	% increase
PyTorch	24.72	-
TorchScript	27.09	9.59
ONNX (with TensorRT EP)	33.52	35.60
TensorRT	32.93	33.21

speed test is performed on a single Nvidia GeForce RTX 3050 GPU

Acknowledgement

PIDNet

Darth-Kronos/PIDNet_TensorRT