NVIDIA DeepStream SDK 5.1 configuration for YOLO models
- Darknet CFG params parser (not need to edit nvdsparsebbox_Yolo.cpp or another file for native models)
- Support to new_coords, beta_nms and scale_x_y params
- Support to new models not supported in official DeepStream SDK YOLO.
- Support to layers not supported in official DeepStream SDK YOLO.
- Support to activations not supported in official DeepStream SDK YOLO.
- Support to Convolutional groups
Tutorial
Benchmark
TensorRT conversion
-
Native (tested models below)
- YOLOv4x-Mish
- YOLOv4-CSP
- YOLOv4
- YOLOv4-Tiny
- YOLOv3-SSP
- YOLOv3
- YOLOv3-Tiny-PRN
- YOLOv3-Tiny
- YOLOv3-Lite
- YOLOv3-Nano
- YOLO-Fastest
- YOLO-Fastest-XL
- YOLOv2
- YOLOv2-Tiny
-
External
Request
- NVIDIA DeepStream SDK 5.1
- DeepStream-Yolo Native (for Darknet YOLO based models)
- DeepStream-Yolo External (for PyTorch YOLOv5 based model)
DeepStream SDK YOLOv4: https://youtu.be/Qi_F_IYpuFQ
Darknet YOLOv4: https://youtu.be/AxJJ9fnJ7Xk
NVIDIA GTX 1050 (4GB Mobile)
CUDA 10.2
Driver 440.33
TensorRT 7.2.1
cuDNN 8.0.5
OpenCV 3.2.0 (libopencv-dev)
OpenCV Python 4.4.0 (opencv-python)
PyTorch 1.7.0
Torchvision 0.8.1
TensorRT | Precision | Resolution | IoU=0.5:0.95 | IoU=0.5 | IoU=0.75 | FPS (with display) |
FPS (without display) |
---|---|---|---|---|---|---|---|
YOLOv5x | FP32 | 608 | 0.406 | 0.562 | 0.441 | 7.91 | 7.99 |
YOLOv5l | FP32 | 608 | 0.385 | 0.540 | 0.419 | 12.82 | 12.97 |
YOLOv5m | FP32 | 608 | 0.354 | 0.507 | 0.388 | 25.09 | 25.97 |
YOLOv5s | FP32 | 608 | 0.281 | 0.430 | 0.307 | 52.02 | 56.21 |
YOLOv4x-MISH | FP32 | 640 | 0.454 | 0.644 | 0.491 | 7.45 | 7.56 |
YOLOv4x-MISH | FP32 | 608 | 0.450 | 0.644 | 0.482 | 7.93 | 8.05 |
YOLOv4-CSP | FP32 | 608 | 0.434 | 0.628 | 0.465 | 13.74 | 14.11 |
YOLOv4-CSP | FP32 | 512 | 0.427 | 0.618 | 0.459 | 21.69 | 22.75 |
YOLOv4 | FP32 | 608 | 0.490 | 0.734 | 0.538 | 11.72 | 12.09 |
YOLOv4 | FP32 | 512 | 0.484 | 0.725 | 0.533 | 19.00 | 19.70 |
YOLOv4 | FP32 | 416 | 0.456 | 0.693 | 0.491 | 22.63 | 23.81 |
YOLOv4 | FP32 | 320 | 0.400 | 0.623 | 0.424 | 32.46 | 35.07 |
YOLOv3-SPP | FP32 | 608 | 0.411 | 0.680 | 0.436 | 11.85 | 12.12 |
YOLOv3 | FP32 | 608 | 0.374 | 0.654 | 0.387 | 12.00 | 12.33 |
YOLOv3 | FP32 | 416 | 0.369 | 0.651 | 0.379 | 23.19 | 24.55 |
YOLOv4-Tiny | FP32 | 416 | 0.195 | 0.382 | 0.175 | 144.55 | 176.31 |
YOLOv3-Tiny-PRN | FP32 | 416 | 0.168 | 0.369 | 0.130 | 181.71 | 244.47 |
YOLOv3-Tiny | FP32 | 416 | 0.165 | 0.357 | 0.128 | 154.19 | 190.42 |
YOLOv3-Lite | FP32 | 416 | 0.165 | 0.350 | 0.131 | 122.40 | 146.19 |
YOLOv3-Lite | FP32 | 320 | 0.155 | 0.324 | 0.128 | 163.76 | 204.21 |
YOLOv3-Nano | FP32 | 416 | 0.127 | 0.277 | 0.098 | 191.77 | 264.59 |
YOLOv3-Nano | FP32 | 320 | 0.122 | 0.258 | 0.099 | 207.04 | 269.89 |
YOLO-Fastest | FP32 | 416 | 0.092 | 0.213 | 0.062 | 174.26 | 221.05 |
YOLO-Fastest | FP32 | 320 | 0.090 | 0.201 | 0.068 | 199.48 | 258.56 |
YOLO-FastestXL | FP32 | 416 | 0.144 | 0.306 | 0.115 | 121.89 | 145.13 |
YOLO-FastestXL | FP32 | 320 | 0.136 | 0.279 | 0.117 | 162.65 | 199.75 |
YOLOv2 | FP32 | 608 | 0.286 | 0.534 | 0.274 | 23.92 | 25.47 |
YOLOv2-Tiny | FP32 | 416 | 0.103 | 0.251 | 0.064 | 165.01 | 203.02 |
Darknet | Precision | Resolution | IoU=0.5:0.95 | IoU=0.5 | IoU=0.75 | FPS (with display) |
FPS (without display) |
---|---|---|---|---|---|---|---|
YOLOv4x-MISH | FP32 | 640 | 0.495 | 0.682 | 0.538 | 5.3 | 5.5 |
YOLOv4x-MISH | FP32 | 608 | 0.493 | 0.680 | 0.535 | 5.4 | 5.6 |
YOLOv4-CSP | FP32 | 608 | 0.473 | 0.661 | 0.515 | 9.2 | 9.5 |
YOLOv4-CSP | FP32 | 512 | 0.458 | 0.645 | 0.496 | 13.6 | 14.0 |
YOLOv4 | FP32 | 608 | 0.513 | 0.748 | 0.574 | 7.3 | 7.5 |
YOLOv4 | FP32 | 512 | 0.506 | 0.738 | 0.564 | 11.8 | 12.3 |
YOLOv4 | FP32 | 416 | 0.479 | 0.709 | 0.527 | 15.4 | 15.8 |
YOLOv4 | FP32 | 320 | 0.421 | 0.638 | 0.454 | 21.0 | 21.7 |
YOLOv3-SPP | FP32 | 608 | 0.432 | 0.701 | 0.465 | 6.9 | 7.1 |
YOLOv3 | FP32 | 608 | 0.391 | 0.672 | 0.412 | 7.0 | 7.3 |
YOLOv3 | FP32 | 416 | 0.384 | 0.668 | 0.402 | 16.3 | 16.9 |
YOLOv4-Tiny | FP32 | 416 | 0.203 | 0.388 | 0.189 | 68.0 | 112.5 |
YOLOv3-Tiny-PRN | FP32 | 416 | 0.172 | 0.378 | 0.133 | 71.6 | 143.9 |
YOLOv3-Tiny | FP32 | 416 | 0.171 | 0.367 | 0.137 | 71.5 | 117.9 |
YOLOv3-Lite | FP32 | 416 | 0.169 | 0.349 | 0.144 | 53.8 | 63.4 |
YOLOv3-Lite | FP32 | 320 | 0.159 | 0.326 | 0.139 | 55.2 | 97.5 |
YOLOv3-Nano | FP32 | 416 | 0.129 | 0.275 | 0.102 | 58.0 | 113.1 |
YOLOv3-Nano | FP32 | 320 | 0.124 | 0.259 | 0.106 | 61.6 | 156.8 |
YOLO-Fastest | FP32 | 416 | 0.095 | 0.213 | 0.068 | 61.7 | 104.1 |
YOLO-Fastest | FP32 | 320 | 0.093 | 0.202 | 0.074 | 65.8 | 143.3 |
YOLO-FastestXL | FP32 | 416 | 0.148 | 0.308 | 0.125 | 62.0 | 75.9 |
YOLO-FastestXL | FP32 | 320 | 0.141 | 0.284 | 0.125 | 63.9 | 112.3 |
YOLOv2 | FP32 | 608 | 0.297 | 0.548 | 0.291 | 12.1 | 12.1 |
YOLOv2-Tiny | FP32 | 416 | 0.105 | 0.255 | 0.068 | 34.5 | 40.7 |
PyTorch | Precision | Resolution | IoU=0.5:0.95 | IoU=0.5 | IoU=0.75 | FPS (with output) |
FPS (without output) |
---|---|---|---|---|---|---|---|
YOLOv5x | FP32 | 608 | 0.487 | 0.676 | 0.527 | 8.25 | 9.49 |
YOLOv5l | FP32 | 608 | 0.471 | 0.662 | 0.512 | 12.67 | 15.77 |
YOLOv5m | FP32 | 608 | 0.439 | 0.631 | 0.474 | 18.13 | 24.80 |
YOLOv5s | FP32 | 608 | 0.369 | 0.567 | 0.395 | 28.03 | 49.52 |
NVIDIA Jetson Nano (4GB)
JetPack 4.4.1
CUDA 10.2
TensorRT 7.1.3
cuDNN 8.0
OpenCV 4.1.1
TensorRT | Precision | Resolution | IoU=0.5:0.95 | IoU=0.5 | IoU=0.75 | FPS (with display) |
FPS (without display) |
---|---|---|---|---|---|---|---|
YOLOv4 | FP32 | 416 | 0.462 | 0.694 | 0.503 | 2.97 | 2.99 |
YOLOv4 | FP16 | 416 | 0.462 | 0.694 | 0.504 | 4.89 | 4.96 |
YOLOv4 | FP32 | 320 | 0.407 | 0.625 | 0.434 | ||
YOLOv4 | FP16 | 320 | 0.408 | 0.625 | 0.435 | ||
YOLOv3 | FP32 | 416 | 0.370 | 0.664 | 0.379 | ||
YOLOv3 | FP16 | 416 | 0.370 | 0.664 | 0.378 | ||
YOLOv4-Tiny | FP32 | 416 | 0.194 | 0.378 | 0.177 | 21.79 | 23.23 |
YOLOv4-Tiny | FP16 | 416 | 0.194 | 0.378 | 0.177 | 24.76 | 26.18 |
YOLOv3-Tiny-PRN | FP32 | 416 | 0.163 | 0.375 | 0.120 | 23.79 | 25.18 |
YOLOv3-Tiny-PRN | FP16 | 416 | 0.163 | 0.375 | 0.119 | 26.08 | 27.96 |
YOLOv3-Tiny | FP32 | 416 | 0.162 | 0.363 | 0.122 | 22.84 | 24.28 |
YOLOv3-Tiny | FP16 | 416 | 0.162 | 0.363 | 0.122 | 25.47 | 27.18 |
Darknet | Precision | Resolution | IoU=0.5:0.95 | IoU=0.5 | IoU=0.75 | FPS (with display) |
FPS (without display) |
---|---|---|---|---|---|---|---|
YOLOv4 | FP32 | 416 | |||||
YOLOv4 | FP32 | 320 | |||||
YOLOv3 | FP32 | 416 | |||||
YOLOv4-Tiny | FP32 | 416 | |||||
YOLOv3-Tiny-PRN | FP32 | 416 | |||||
YOLOv3-Tiny | FP32 | 416 | |||||
YOLOv2 | FP32 | 608 | |||||
YOLOv2-Tiny | FP32 | 416 |
PyTorch | Precision | Resolution | IoU=0.5:0.95 | IoU=0.5 | IoU=0.75 | FPS (with output) |
FPS (without output) |
---|---|---|---|---|---|---|---|
YOLOv5s | FP32 | 416 | |||||
YOLOv5s | FP16 | 416 |
- General
width = 1920
height = 1080
maintain-aspect-ratio = 0
batch-size = 1
- Evaluate mAP
valid = val2017 (COCO)
nms-iou-threshold = 0.6
pre-cluster-threshold = 0.001 (CONF_THRESH)
- Evaluate FPS and Demo
nms-iou-threshold = 0.45 (NMS; changed to beta_nms when available)
pre-cluster-threshold = 0.25 (CONF_THRESH)
Run command
sudo chmod -R 777 /opt/nvidia/deepstream/deepstream-5.1/sources/
Download my native folder, rename to yolo and move to your deepstream/sources folder.
Download cfg and weights files from your model and move to deepstream/sources/yolo folder.
- YOLOv4x-Mish [cfg] [weights]
- YOLOv4-CSP [cfg] [weights]
- YOLOv4 [cfg] [weights]
- YOLOv4-Tiny [cfg] [weights]
- YOLOv3-SPP [cfg] [weights]
- YOLOv3 [cfg] [weights]
- YOLOv3-Tiny-PRN [cfg] [weights]
- YOLOv3-Tiny [cfg] [weights]
- YOLOv3-Lite [cfg] [weights]
- YOLOv3-Nano [cfg] [weights]
- YOLO-Fastest [cfg] [weights]
- YOLO-Fastest-XL [cfg] [weights]
- YOLOv2 [cfg] [weights]
- YOLOv2-Tiny [cfg] [weights]
Compile
- x86 platform
cd /opt/nvidia/deepstream/deepstream-5.1/sources/yolo
CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo
- Jetson platform
cd /opt/nvidia/deepstream/deepstream-5.1/sources/yolo
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo
Edit config_infer_primary.txt for your model (example for YOLOv4)
[property]
...
# 0=RGB, 1=BGR, 2=GRAYSCALE
model-color-format=0
# CFG
custom-network-config=yolov4.cfg
# Weights
model-file=yolov4.weights
# Generated TensorRT model (will be created if it doesn't exist)
model-engine-file=model_b1_gpu0_fp32.engine
# Model labels file
labelfile-path=labels.txt
# Batch size
batch-size=1
# 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
# Number of classes in label file
num-detected-classes=80
...
[class-attrs-all]
# CONF_THRESH
pre-cluster-threshold=0.25
Run
deepstream-app -c deepstream_app_config.txt
If you want to use YOLOv2 or YOLOv2-Tiny models, change, before run, deepstream_app_config.txt
[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV2.txt
Note: config_infer_primary.txt uses cluster-mode=4 and NMS = 0.45 (via code) when beta_nms isn't available (when beta_nms is available, NMS = beta_nms), while config_infer_primary_yoloV2.txt uses cluster-mode=2 and nms-iou-threshold=0.45 to set NMS.
To request moded files for native TensorRT conversion to use in DeepStream SDK, send me the model cfg and weights files via Issues tab.
Note: If your model are listed in native tab, you can use my native folder to run your model in DeepStream.
You can get metadata from deepstream in Python and C++. For C++, you need edit deepstream-app or deepstream-test code. For Python your need install and edit deepstream_python_apps.
You need manipulate NvDsObjectMeta (Python/C++), NvDsFrameMeta (Python/C++) and NvOSD_RectParams (Python/C++) to get label, position, etc. of bboxs.
In C++ deepstream-app application, your code need be in analytics_done_buf_prob function. In C++/Python deepstream-test application, your code need be in osd_sink_pad_buffer_probe/tiler_src_pad_buffer_probe function.
Python is slightly slower than C (about 5-10%).
This code is open-source. You can use as you want. :)
If you want me to create commercial DeepStream SDK projects for you, contact me at email address available in GitHub.
My projects: https://www.youtube.com/MarcosLucianoTV