DeepStream-Yolo

NVIDIA DeepStream SDK 5.1 configuration for YOLO models

Improvements on this repository

Darknet CFG params parser (it doesn't need to edit nvdsparsebbox_Yolo.cpp or another file for native models)
Support for new_coords, beta_nms and scale_x_y params
Support for new models that aren't supported in official DeepStream SDK YOLO.
Support for layers that aren't supported in official DeepStream SDK YOLO.
Support for activations that aren't supported in official DeepStream SDK YOLO.
Support for Convolutional groups
Support for INT8 calibration (it isn't available for YOLOv5 models)
Support for non square models

Tutorial

TensorRT conversion

Native (tested models below)
- YOLOv4x-Mish [cfg] [weights]
- YOLOv4-CSP [cfg] [weights]
- YOLOv4 [cfg] [weights]
- YOLOv4-Tiny [cfg] [weights]
- YOLOv3-SPP [cfg] [weights]
- YOLOv3 [cfg] [weights]
- YOLOv3-Tiny-PRN [cfg] [weights]
- YOLOv3-Tiny [cfg] [weights]
- YOLOv3-Lite [cfg] [weights]
- YOLOv3-Nano [cfg] [weights]
- YOLO-Fastest 1.1 [cfg] [weights]
- YOLO-Fastest-XL 1.1 [cfg] [weights]
- YOLOv2 [cfg] [weights]
- YOLOv2-Tiny [cfg] [weights]
External

Benchmark

mAP/FPS comparison between models

Requirements

NVIDIA DeepStream SDK 5.1
DeepStream-Yolo Native (for Darknet YOLO based models)
DeepStream-Yolo External (for PyTorch YOLOv5 based model)

Basic usage

git clone https://github.com/marcoslucianops/DeepStream-Yolo.git
cd DeepStream-Yolo/native

Download cfg and weights files from your model and move to DeepStream-Yolo/native folder

Compile

x86 platform

CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo

Jetson platform

CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo

Edit config_infer_primary.txt for your model (example for YOLOv4)

[property]
...
# 0=RGB, 1=BGR, 2=GRAYSCALE
model-color-format=0
# CFG
custom-network-config=yolov4.cfg
# Weights
model-file=yolov4.weights
# Generated TensorRT model (will be created if it doesn't exist)
model-engine-file=model_b1_gpu0_fp32.engine
# Model labels file
labelfile-path=labels.txt
# Batch size
batch-size=1
# 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
# Number of classes in label file
num-detected-classes=80
...
[class-attrs-all]
# CONF_THRESH
pre-cluster-threshold=0.25

Run

deepstream-app -c deepstream_app_config.txt

If you want to use YOLOv2 or YOLOv2-Tiny models, change, before run, deepstream_app_config.txt

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV2.txt

Note: config_infer_primary.txt uses cluster-mode=4 and NMS = 0.45 (via code) when beta_nms isn't available (when beta_nms is available, NMS = beta_nms), while config_infer_primary_yoloV2.txt uses cluster-mode=2 and nms-iou-threshold=0.45 to set NMS.

INT8 calibration

Install OpenCV

sudo apt-get install libopencv-dev

Compile/recompile the nvdsinfer_custom_impl_Yolo lib with OpenCV support

x86 platform

cd DeepStream-Yolo/native
CUDA_VER=11.1 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo

Jetson platform

cd DeepStream-Yolo/native
CUDA_VER=10.2 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo

For COCO dataset, download the val2017, extract, and move to DeepStream-Yolo/native folder

Select 1000 random images from COCO dataset to run calibration

mkdir calibration
for jpg in $(ls -1 val2017/*.jpg | sort -R | head -1000); do \
    cp val2017/${jpg} calibration/; \
done

Create the calibration.txt file with all selected images

realpath calibration/*jpg > calibration.txt

Set environment variables

export INT8_CALIB_IMG_PATH=calibration.txt
export INT8_CALIB_BATCH_SIZE=1

Change config_infer_primary.txt file

...
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
...
network-mode=0
...

...
model-engine-file=model_b1_gpu0_int8.engine
int8-calib-file=calib.table
...
network-mode=1
...

Run

deepstream-app -c deepstream_app_config.txt

Note: NVIDIA recommends at least 500 images to get a good accuracy. In this example I used 1000 images to get better accuracy (more images = more accuracy). Higher INT8_CALIB_BATCH_SIZE values will increase the accuracy and calibration speed. Set it according to you GPU memory. This process can take a long time. The calibration isn't available for YOLOv5 models.

mAP/FPS comparison between models

Open

valid = val2017 (COCO)
NMS = 0.45 (changed to beta_nms when used in Darknet cfg file) / 0.6 (YOLOv5 models)
pre-cluster-threshold = 0.001 (mAP eval) / 0.25 (FPS measurement)
batch-size = 1
FPS measurement display width = 1920
FPS measurement display height = 1080
NOTE: Used NVIDIA GTX 1050 (4GB Mobile) for evaluate. Used maintain-aspect-ratio=1 in config_infer file for YOLOv4 (with letter_box=1) and YOLOv5 models. For INT8 calibration, was used 1000 random images from val2017 (COCO) and INT8_CALIB_BATCH_SIZE=1.

TensorRT	Precision	Resolution	IoU=0.5:0.95	IoU=0.5	IoU=0.75	FPS (with display)	FPS (without display)
YOLOv5x 5.0	FP32	640	0.	0.	0.	.	.
YOLOv5l 5.0	FP32	640	0.	0.	0.	.	.
YOLOv5m 5.0	FP32	640	0.	0.	0.	.	.
YOLOv5s 5.0	FP32	640	0.	0.	0.	.	.
YOLOv5s 5.0	FP32	416	0.	0.	0.	.	.
YOLOv4x-MISH	FP32	640	0.461	0.649	0.499	.	.
YOLOv4x-MISH	INT8	640	0.443	0.629	0.479	.	.
YOLOv4x-MISH	FP32	608	0.461	0.650	0.496	.	.
YOLOv4-CSP	FP32	640	0.443	0.632	0.477	.	.
YOLOv4-CSP	FP32	608	0.443	0.632	0.477	.	.
YOLOv4-CSP	FP32	512	0.437	0.625	0.471	.	.
YOLOv4-CSP	INT8	512	0.414	0.601	0.447	.	.
YOLOv4	FP32	640	0.492	0.729	0.547	.	.
YOLOv4	FP32	608	0.499	0.739	0.551	.	.
YOLOv4	INT8	608	0.483	0.728	0.534	.	.
YOLOv4	FP32	512	0.492	0.730	0.542	.	.
YOLOv4	FP32	416	0.468	0.702	0.507	.	.
YOLOv3-SPP	FP32	608	0.412	0.687	0.434	.	.
YOLOv3	FP32	608	0.378	0.674	0.389	.	.
YOLOv3	INT8	608	0.381	0.677	0.388	.	.
YOLOv3	FP32	416	0.373	0.669	0.379	.	.
YOLOv2	FP32	608	0.211	0.365	0.220	.	.
YOLOv2	FP32	416	0.207	0.362	0.211	.	.
YOLOv4-Tiny	FP32	416	0.216	0.403	0.207	.	.
YOLOv4-Tiny	INT8	416	0.203	0.385	0.192	.	.
YOLOv3-Tiny-PRN	FP32	416	0.168	0.381	0.126	.	.
YOLOv3-Tiny-PRN	INT8	416	0.155	0.358	0.113	.	.
YOLOv3-Tiny	FP32	416	0.096	0.203	0.080	.	.
YOLOv2-Tiny	FP32	416	0.084	0.194	0.062	.	.
YOLOv3-Lite	FP32	416	0.169	0.356	0.137	.	.
YOLOv3-Lite	FP32	320	0.158	0.328	0.132	.	.
YOLOv3-Nano	FP32	416	0.128	0.278	0.099	.	.
YOLOv3-Nano	FP32	320	0.122	0.260	0.099	.	.
YOLO-Fastest-XL	FP32	416	0.160	0.342	0.130	.	.
YOLO-Fastest-XL	FP32	320	0.158	0.329	0.135	.	.
YOLO-Fastest	FP32	416	0.101	0.230	0.072	.	.
YOLO-Fastest	FP32	320	0.102	0.232	0.073	.	.

Extract metadata

You can get metadata from deepstream in Python and C++. For C++, you need edit deepstream-app or deepstream-test code. For Python your need install and edit deepstream_python_apps.

You need manipulate NvDsObjectMeta (Python/C++), NvDsFrameMeta (Python/C++) and NvOSD_RectParams (Python/C++) to get label, position, etc. of bboxs.

In C++ deepstream-app application, your code need be in analytics_done_buf_prob function. In C++/Python deepstream-test application, your code need be in osd_sink_pad_buffer_probe/tiler_src_pad_buffer_probe function.

Python is slightly slower than C (about 5-10%).

This code is open-source. You can use as you want. :)

My projects: https://www.youtube.com/MarcosLucianoTV