/DeepStream-Yolo

NVIDIA DeepStream SDK 5.1 configuration for YOLO models

Primary LanguageC++

DeepStream-Yolo

NVIDIA DeepStream SDK 5.1 configuration for YOLO models

Improvements on this repository

  • Darknet CFG params parser (it doesn't need to edit nvdsparsebbox_Yolo.cpp or another file for native models)
  • Support for new_coords, beta_nms and scale_x_y params
  • Support for new models that aren't supported in official DeepStream SDK YOLO.
  • Support for layers that aren't supported in official DeepStream SDK YOLO.
  • Support for activations that aren't supported in official DeepStream SDK YOLO.
  • Support for Convolutional groups
  • Support for INT8 calibration (it isn't available for YOLOv5 models)
  • Support for non square models

Tutorial

TensorRT conversion

Benchmark

Requirements

Basic usage

git clone https://github.com/marcoslucianops/DeepStream-Yolo.git
cd DeepStream-Yolo/native

Download cfg and weights files from your model and move to DeepStream-Yolo/native folder

Compile

  • x86 platform
CUDA_VER=11.1 make -C nvdsinfer_custom_impl_Yolo
  • Jetson platform
CUDA_VER=10.2 make -C nvdsinfer_custom_impl_Yolo

Edit config_infer_primary.txt for your model (example for YOLOv4)

[property]
...
# 0=RGB, 1=BGR, 2=GRAYSCALE
model-color-format=0
# CFG
custom-network-config=yolov4.cfg
# Weights
model-file=yolov4.weights
# Generated TensorRT model (will be created if it doesn't exist)
model-engine-file=model_b1_gpu0_fp32.engine
# Model labels file
labelfile-path=labels.txt
# Batch size
batch-size=1
# 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
# Number of classes in label file
num-detected-classes=80
...
[class-attrs-all]
# CONF_THRESH
pre-cluster-threshold=0.25

Run

deepstream-app -c deepstream_app_config.txt

If you want to use YOLOv2 or YOLOv2-Tiny models, change, before run, deepstream_app_config.txt

[primary-gie]
enable=1
gpu-id=0
gie-unique-id=1
nvbuf-memory-type=0
config-file=config_infer_primary_yoloV2.txt

Note: config_infer_primary.txt uses cluster-mode=4 and NMS = 0.45 (via code) when beta_nms isn't available (when beta_nms is available, NMS = beta_nms), while config_infer_primary_yoloV2.txt uses cluster-mode=2 and nms-iou-threshold=0.45 to set NMS.

INT8 calibration

Install OpenCV

sudo apt-get install libopencv-dev

Compile/recompile the nvdsinfer_custom_impl_Yolo lib with OpenCV support

  • x86 platform
cd DeepStream-Yolo/native
CUDA_VER=11.1 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo
  • Jetson platform
cd DeepStream-Yolo/native
CUDA_VER=10.2 OPENCV=1 make -C nvdsinfer_custom_impl_Yolo

For COCO dataset, download the val2017, extract, and move to DeepStream-Yolo/native folder

Select 1000 random images from COCO dataset to run calibration

mkdir calibration
for jpg in $(ls -1 val2017/*.jpg | sort -R | head -1000); do \
    cp val2017/${jpg} calibration/; \
done

Create the calibration.txt file with all selected images

realpath calibration/*jpg > calibration.txt

Set environment variables

export INT8_CALIB_IMG_PATH=calibration.txt
export INT8_CALIB_BATCH_SIZE=1

Change config_infer_primary.txt file

...
model-engine-file=model_b1_gpu0_fp32.engine
#int8-calib-file=calib.table
...
network-mode=0
...

To

...
model-engine-file=model_b1_gpu0_int8.engine
int8-calib-file=calib.table
...
network-mode=1
...

Run

deepstream-app -c deepstream_app_config.txt

Note: NVIDIA recommends at least 500 images to get a good accuracy. In this example I used 1000 images to get better accuracy (more images = more accuracy). Higher INT8_CALIB_BATCH_SIZE values will increase the accuracy and calibration speed. Set it according to you GPU memory. This process can take a long time. The calibration isn't available for YOLOv5 models.

mAP/FPS comparison between models

Open
valid = val2017 (COCO)
NMS = 0.45 (changed to beta_nms when used in Darknet cfg file) / 0.6 (YOLOv5 models)
pre-cluster-threshold = 0.001 (mAP eval) / 0.25 (FPS measurement)
batch-size = 1
FPS measurement display width = 1920
FPS measurement display height = 1080
NOTE: Used NVIDIA GTX 1050 (4GB Mobile) for evaluate. Used maintain-aspect-ratio=1 in config_infer file for YOLOv4 (with letter_box=1) and YOLOv5 models. For INT8 calibration, was used 1000 random images from val2017 (COCO) and INT8_CALIB_BATCH_SIZE=1.
TensorRT Precision Resolution IoU=0.5:0.95 IoU=0.5 IoU=0.75 FPS
(with display)
FPS
(without display)
YOLOv5x 5.0 FP32 640 0. 0. 0. . .
YOLOv5l 5.0 FP32 640 0. 0. 0. . .
YOLOv5m 5.0 FP32 640 0. 0. 0. . .
YOLOv5s 5.0 FP32 640 0. 0. 0. . .
YOLOv5s 5.0 FP32 416 0. 0. 0. . .
YOLOv4x-MISH FP32 640 0.461 0.649 0.499 . .
YOLOv4x-MISH INT8 640 0.443 0.629 0.479 . .
YOLOv4x-MISH FP32 608 0.461 0.650 0.496 . .
YOLOv4-CSP FP32 640 0.443 0.632 0.477 . .
YOLOv4-CSP FP32 608 0.443 0.632 0.477 . .
YOLOv4-CSP FP32 512 0.437 0.625 0.471 . .
YOLOv4-CSP INT8 512 0.414 0.601 0.447 . .
YOLOv4 FP32 640 0.492 0.729 0.547 . .
YOLOv4 FP32 608 0.499 0.739 0.551 . .
YOLOv4 INT8 608 0.483 0.728 0.534 . .
YOLOv4 FP32 512 0.492 0.730 0.542 . .
YOLOv4 FP32 416 0.468 0.702 0.507 . .
YOLOv3-SPP FP32 608 0.412 0.687 0.434 . .
YOLOv3 FP32 608 0.378 0.674 0.389 . .
YOLOv3 INT8 608 0.381 0.677 0.388 . .
YOLOv3 FP32 416 0.373 0.669 0.379 . .
YOLOv2 FP32 608 0.211 0.365 0.220 . .
YOLOv2 FP32 416 0.207 0.362 0.211 . .
YOLOv4-Tiny FP32 416 0.216 0.403 0.207 . .
YOLOv4-Tiny INT8 416 0.203 0.385 0.192 . .
YOLOv3-Tiny-PRN FP32 416 0.168 0.381 0.126 . .
YOLOv3-Tiny-PRN INT8 416 0.155 0.358 0.113 . .
YOLOv3-Tiny FP32 416 0.096 0.203 0.080 . .
YOLOv2-Tiny FP32 416 0.084 0.194 0.062 . .
YOLOv3-Lite FP32 416 0.169 0.356 0.137 . .
YOLOv3-Lite FP32 320 0.158 0.328 0.132 . .
YOLOv3-Nano FP32 416 0.128 0.278 0.099 . .
YOLOv3-Nano FP32 320 0.122 0.260 0.099 . .
YOLO-Fastest-XL FP32 416 0.160 0.342 0.130 . .
YOLO-Fastest-XL FP32 320 0.158 0.329 0.135 . .
YOLO-Fastest FP32 416 0.101 0.230 0.072 . .
YOLO-Fastest FP32 320 0.102 0.232 0.073 . .

Extract metadata

You can get metadata from deepstream in Python and C++. For C++, you need edit deepstream-app or deepstream-test code. For Python your need install and edit deepstream_python_apps.

You need manipulate NvDsObjectMeta (Python/C++), NvDsFrameMeta (Python/C++) and NvOSD_RectParams (Python/C++) to get label, position, etc. of bboxs.

In C++ deepstream-app application, your code need be in analytics_done_buf_prob function. In C++/Python deepstream-test application, your code need be in osd_sink_pad_buffer_probe/tiler_src_pad_buffer_probe function.

Python is slightly slower than C (about 5-10%).

This code is open-source. You can use as you want. :)

My projects: https://www.youtube.com/MarcosLucianoTV