DeepStream-Yolo

NVIDIA DeepStream SDK 5.0.1 configuration for YOLO models

Improvements on this repository

Darknet CFG params parser (not need to edit nvdsparsebbox_Yolo.cpp or another file for native models)
Support to new_coords, beta_nms and scale_x_y params
Support to new models not supported in official DeepStream SDK YOLO.
Support to layers not supported in official DeepStream SDK YOLO.
Support to activations not supported in official DeepStream SDK YOLO.
Support to Convolutional groups

Tutorial

Benchmark

mAP/FPS comparison between models

TensorRT conversion

Native (tested models below)
- YOLOv4x-Mish
- YOLOv4-CSP
- YOLOv4
- YOLOv4-Tiny
- YOLOv3-SSP
- YOLOv3
- YOLOv3-Tiny-PRN
- YOLOv3-Tiny
- YOLOv3-Lite
- YOLOv3-Nano
- YOLO-Fastest
- YOLO-Fastest-XL
- YOLOv2
- YOLOv2-Tiny
External
- YOLOv5

Request

Request native TensorRT conversion for your YOLO-based model

Requirements

NVIDIA DeepStream SDK 5.0.1
DeepStream-Yolo Native (for Darknet YOLO based models)
DeepStream-Yolo External (for PyTorch YOLOv5 based model)

mAP/FPS comparison between models

DeepStream SDK YOLOv4: https://youtu.be/Qi_F_IYpuFQ

Darknet YOLOv4: https://youtu.be/AxJJ9fnJ7Xk

NVIDIA GTX 1050 (4GB Mobile)

CUDA 10.2
Driver 440.33
TensorRT 7.2.1
cuDNN 8.0.5
OpenCV 3.2.0 (libopencv-dev)
OpenCV Python 4.4.0 (opencv-python)
PyTorch 1.7.0
Torchvision 0.8.1

TensorRT	Precision	Resolution	IoU=0.5:0.95	IoU=0.5	IoU=0.75	FPS (with display)	FPS (without display)
YOLOv5x	FP32	608	0.406	0.562	0.441	7.91	7.99
YOLOv5l	FP32	608	0.385	0.540	0.419	12.82	12.97
YOLOv5m	FP32	608	0.354	0.507	0.388	25.09	25.97
YOLOv5s	FP32	608	0.281	0.430	0.307	52.02	56.21
YOLOv4x-MISH	FP32	640	0.454	0.644	0.491	7.45	7.56
YOLOv4x-MISH	FP32	608	0.450	0.644	0.482	7.93	8.05
YOLOv4-CSP	FP32	608	0.434	0.628	0.465	13.74	14.11
YOLOv4-CSP	FP32	512	0.427	0.618	0.459	21.69	22.75
YOLOv4	FP32	608	0.490	0.734	0.538	11.72	12.09
YOLOv4	FP32	512	0.484	0.725	0.533	19.00	19.70
YOLOv4	FP32	416	0.456	0.693	0.491	22.63	23.81
YOLOv4	FP32	320	0.400	0.623	0.424	32.46	35.07
YOLOv3-SPP	FP32	608	0.411	0.680	0.436	11.85	12.12
YOLOv3	FP32	608	0.374	0.654	0.387	12.00	12.33
YOLOv3	FP32	416	0.369	0.651	0.379	23.19	24.55
YOLOv4-Tiny	FP32	416	0.195	0.382	0.175	144.55	176.31
YOLOv3-Tiny-PRN	FP32	416	0.168	0.369	0.130	181.71	244.47
YOLOv3-Tiny	FP32	416	0.165	0.357	0.128	154.19	190.42
YOLOv3-Lite	FP32	416	0.165	0.350	0.131	122.40	146.19
YOLOv3-Lite	FP32	320	0.155	0.324	0.128	163.76	204.21
YOLOv3-Nano	FP32	416	0.127	0.277	0.098	191.77	264.59
YOLOv3-Nano	FP32	320	0.122	0.258	0.099	207.04	269.89
YOLO-Fastest	FP32	416	0.092	0.213	0.062	174.26	221.05
YOLO-Fastest	FP32	320	0.090	0.201	0.068	199.48	258.56
YOLO-FastestXL	FP32	416	0.144	0.306	0.115	121.89	145.13
YOLO-FastestXL	FP32	320	0.136	0.279	0.117	162.65	199.75
YOLOv2	FP32	608	0.286	0.534	0.274	23.92	25.47
YOLOv2-Tiny	FP32	416	0.103	0.251	0.064	165.01	203.02

Darknet	Precision	Resolution	IoU=0.5:0.95	IoU=0.5	IoU=0.75	FPS (with display)	FPS (without display)
YOLOv4x-MISH	FP32	640	0.495	0.682	0.538	5.3	5.5
YOLOv4x-MISH	FP32	608	0.493	0.680	0.535	5.4	5.6
YOLOv4-CSP	FP32	608	0.473	0.661	0.515	9.2	9.5
YOLOv4-CSP	FP32	512	0.458	0.645	0.496	13.6	14.0
YOLOv4	FP32	608	0.513	0.748	0.574	7.3	7.5
YOLOv4	FP32	512	0.506	0.738	0.564	11.8	12.3
YOLOv4	FP32	416	0.479	0.709	0.527	15.4	15.8
YOLOv4	FP32	320	0.421	0.638	0.454	21.0	21.7
YOLOv3-SPP	FP32	608	0.432	0.701	0.465	6.9	7.1
YOLOv3	FP32	608	0.391	0.672	0.412	7.0	7.3
YOLOv3	FP32	416	0.384	0.668	0.402	16.3	16.9
YOLOv4-Tiny	FP32	416	0.203	0.388	0.189	68.0	112.5
YOLOv3-Tiny-PRN	FP32	416	0.172	0.378	0.133	71.6	143.9
YOLOv3-Tiny	FP32	416	0.171	0.367	0.137	71.5	117.9
YOLOv3-Lite	FP32	416	0.169	0.349	0.144	53.8	63.4
YOLOv3-Lite	FP32	320	0.159	0.326	0.139	55.2	97.5
YOLOv3-Nano	FP32	416	0.129	0.275	0.102	58.0	113.1
YOLOv3-Nano	FP32	320	0.124	0.259	0.106	61.6	156.8
YOLO-Fastest	FP32	416	0.095	0.213	0.068	61.7	104.1
YOLO-Fastest	FP32	320	0.093	0.202	0.074	65.8	143.3
YOLO-FastestXL	FP32	416	0.148	0.308	0.125	62.0	75.9
YOLO-FastestXL	FP32	320	0.141	0.284	0.125	63.9	112.3
YOLOv2	FP32	608	0.297	0.548	0.291	12.1	12.1
YOLOv2-Tiny	FP32	416	0.105	0.255	0.068	34.5	40.7

PyTorch	Precision	Resolution	IoU=0.5:0.95	IoU=0.5	IoU=0.75	FPS (with output)	FPS (without output)
YOLOv5x	FP32	608	0.487	0.676	0.527	8.25	9.49
YOLOv5l	FP32	608	0.471	0.662	0.512	12.67	15.77
YOLOv5m	FP32	608	0.439	0.631	0.474	18.13	24.80
YOLOv5s	FP32	608	0.369	0.567	0.395	28.03	49.52

NVIDIA Jetson Nano (4GB)

JetPack 4.4.1
CUDA 10.2
TensorRT 7.1.3
cuDNN 8.0
OpenCV 4.1.1

TensorRT	Precision	Resolution	IoU=0.5:0.95	IoU=0.5	IoU=0.75	FPS (with display)	FPS (without display)
YOLOv4	FP32	416	0.462	0.694	0.503	2.97	2.99
YOLOv4	FP16	416	0.462	0.694	0.504	4.89	4.96
YOLOv4	FP32	320	0.407	0.625	0.434
YOLOv4	FP16	320	0.408	0.625	0.435
YOLOv3	FP32	416	0.370	0.664	0.379
YOLOv3	FP16	416	0.370	0.664	0.378
YOLOv4-Tiny	FP32	416	0.194	0.378	0.177	21.79	23.23
YOLOv4-Tiny	FP16	416	0.194	0.378	0.177	24.76	26.18
YOLOv3-Tiny-PRN	FP32	416	0.163	0.375	0.120	23.79	25.18
YOLOv3-Tiny-PRN	FP16	416	0.163	0.375	0.119	26.08	27.96
YOLOv3-Tiny	FP32	416	0.162	0.363	0.122	22.84	24.28
YOLOv3-Tiny	FP16	416	0.162	0.363	0.122	25.47	27.18

Darknet	Precision	Resolution
YOLOv4	FP32	416
YOLOv4	FP32	320
YOLOv3	FP32	416
YOLOv4-Tiny	FP32	416
YOLOv3-Tiny-PRN	FP32	416
YOLOv3-Tiny	FP32	416
YOLOv2	FP32	608
YOLOv2-Tiny	FP32	416

PyTorch	Precision	Resolution	IoU=0.5:0.95	IoU=0.5	IoU=0.75	FPS (with output)	FPS (without output)
YOLOv5s	FP32	416
YOLOv5s	FP16	416

DeepStream settings

General

width = 1920
height = 1080
maintain-aspect-ratio = 0
batch-size = 1

Evaluate mAP

valid = val2017 (COCO)
nms-iou-threshold = 0.6
pre-cluster-threshold = 0.001 (CONF_THRESH)

Evaluate FPS and Demo

nms-iou-threshold = 0.45 (NMS; changed to beta_nms when available)
pre-cluster-threshold = 0.25 (CONF_THRESH)

Native TensorRT conversion

Download my native folder, rename to yolo and move to your deepstream/sources folder.

Download cfg and weights files from your model and move to deepstream/sources/yolo folder.

Compile

cd /opt/nvidia/deepstream/deepstream-5.0/sources/yolo
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo

Edit config_infer_primary.txt for your model (example for YOLOv4)

[property]
...
# 0=RGB, 1=BGR, 2=GRAYSCALE
model-color-format=0
# CFG
custom-network-config=yolov4.cfg
# Weights
model-file=yolov4.weights
# Generated TensorRT model (will be created if it doesn't exist)
model-engine-file=model_b1_gpu0_fp32.engine
# Model labels file
labelfile-path=labels.txt
# Batch size
batch-size=1
# 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
# Number of classes in label file
num-detected-classes=80
...
[class-attrs-all]
# CONF_THRESH
pre-cluster-threshold=0.25

Run

deepstream-app -c deepstream_app_config.txt

If you want to use YOLOv2 or YOLOv2-Tiny models, change, before run, deepstream_app_config.txt

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV2.txt

Note: config_infer_primary.txt uses cluster-mode=4 and NMS = 0.45 (via code) when beta_nms isn't available (when beta_nms is available, NMS = beta_nms), while config_infer_primary_yoloV2.txt uses cluster-mode=2 and nms-iou-threshold=0.45 to set NMS.

Request native TensorRT conversion for your YOLO-based model

To request moded files for native TensorRT conversion to use in DeepStream SDK, send me the model cfg and weights files via Issues tab.

Note: If your model are listed in native tab, you can use my native folder to run your model in DeepStream.

For commercial DeepStream SDK projects, contact me at email address available in GitHub.

My projects: https://www.youtube.com/MarcosLucianoTV