/deepstream-yolo-pose

Use Deepstream python API to extract the model output tensor and customize the post-processing of YOLO-Pose

Primary LanguagePythonApache License 2.0Apache-2.0

Deepstream-YOLO-Pose

Multistream_4_YOLOv8s-pose-3.PNG
YOLO-Pose accelerated with TensorRT and multi-streaming with Deepstream SDK

Build Status Python Version img img img

System Requirements

  • Python 3.8
    • Should be already installed with Ubuntu 20.04
  • Ubuntu 20.04
  • CUDA 11.4 (Jetson)
  • TensorRT 8+

DeepStream 6.x on x86 platform

DeepStream 6.x on Jetson platform

Deepstream Python Biding

Gst-python and GstRtspServer

  • Installing GstRtspServer and introspection typelib

    sudo apt update
    sudo apt install python3-gi python3-dev python3-gst-1.0 -y
    sudo apt-get install libgstrtspserver-1.0-0 gstreamer1.0-rtsp
    

    For gst-rtsp-server (and other GStreamer stuff) to be accessible in Python through gi.require_version(), it needs to be built with gobject-introspection enabled (libgstrtspserver-1.0-0 is already). Yet, we need to install the introspection typelib package:

    sudo apt-get install libgirepository1.0-dev
    sudo apt-get install gobject-introspection gir1.2-gst-rtsp-server-1.0
    

Prepare YOLO-Pose Model

netron_yolov8s-pose_dy_onnx.PNG
YOLO-pose architecture

source : YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss

Prepare YOLOv8 TensorRT Engine

  • Choose yolov8-pose for better operator optimization of ONNX model

  • Base on triple-Mu/YOLOv8-TensorRT/Pose.md

  • The yolov8-pose model conversion route is : YOLOv8 PyTorch model -> ONNX -> TensorRT Engine

    Notice !!! ⚠️ This repository don't support TensorRT API building !!!

0. Get yolov8s-pose.pt

https://github.com/ultralytics/ultralytics

Benchmark of YOLOv8-Pose

See Pose Docs for usage examples with these models.

Model size
(pixels)
mAPpose
50-95
mAPpose
50
Speed
CPU ONNX
(ms)
Speed
A100 TensorRT
(ms)
params
(M)
FLOPs
(B)
YOLOv8n-pose 640 50.4 80.1 131.8 1.18 3.3 9.2
YOLOv8s-pose 640 60.0 86.2 233.2 1.42 11.6 30.2
YOLOv8m-pose 640 65.0 88.8 456.3 2.00 26.4 81.0
YOLOv8l-pose 640 67.6 90.0 784.5 2.59 44.4 168.6
YOLOv8x-pose 640 69.2 90.2 1607.1 3.73 69.4 263.2
YOLOv8x-pose-p6 1280 71.6 91.2 4088.7 10.04 99.1 1066.4
  • mAPval values are for single-model single-scale on COCO Keypoints val2017 dataset.
    Reproduce by yolo val pose data=coco-pose.yaml device=0

  • Speed averaged over COCO val images using an Amazon EC2 P4d instance.
    Reproduce by yolo val pose data=coco8-pose.yaml batch=1 device=0|cpu

  • Source : ultralytics

wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8s-pose.pt

1. Pytorch Model to Onnx Model

  • Export Orin ONNX model by ultralytics You can leave this repo and use the original ultralytics repo for onnx export.

  • CLI tools(yolo command from "ultralytics.com")

    yolo export model=yolov8s-pose.pt format=onnx device=0 \
                imgsz=640 \
                dynamic=true \
                simplify=true

    After executing the above command, you will get an engine named yolov8s-pose.onnx too.

  • Move your Onnx Model to egdge device in specific path

    • put model on your edge device
      sudo chmod u+rwx -R /opt/nvidia/deepstream/deepstream/samples/models # Add Write and execute permissions 
      sudo mkdir -p tao_pretrained_models/YOLOv8-TensorRT 
      sudo chmod u+rwx -R tao_pretrained_models/YOLOv8-TensorRT 
      
      mv -v <path_of_your_yolov8-pose_model> /opt/nvidia/deepstream/deepstream/samples/models/tao_pretrained_models/YOLOv8-TensorRT/yolov8s-pose-dy-sim-640.onnx

[Optional] Execute netron yolov8s-pose.onnx to view the model architecture

  • Check Model Ouputs
    • Note that the number of anchors for YOLOv8-Pose is 56
      • bbox(4) + confidence(1) + keypoints(3 x 17) = 4 + 1 + 0 + 51 = 56
    • The number of anchors of YOLOv7-Pose is 57
      • bbox(4) + confidence(1) + cls(1) + keypoints(3 x 17) = 4 + 1 + 1 + 51 = 57
  • Model registration information of YOLOv8S-Pose
    • INPUTS : (batch, channel, height, width)
    • OUTPUTS : (batch, anchors, max_outpus)
netron_yolov8s-pose_dy-sim-640_onnx.PNG

2. Onnx to TensorRT Engine with dynamic_batch

  • ⚠️ Must be bound to a hardware device, please put it on your edge device(It's a long wait ⌛)
  • Specify parameters such as -minShapes --optShapes --maxShapes to set dynamic batch processing.
cd /opt/nvidia/deepstream/deepstream/samples/models/tao_pretrained_models/YOLOv8-TensorRT 
sudo /usr/src/tensorrt/bin/trtexec --verbose \
    --onnx=yolov8s-pose-dy-sim-640.onnx \
    --fp16 \
    --workspace=4096 \
    --minShapes=images:1x3x640x640 \
    --optShapes=images:12x3x640x640 \
    --maxShapes=images:16x3x640x640 \
    --saveEngine=yolov8s-pose-dy-sim-640.engine

3. Test and Check Tensortrt Engine

/usr/src/tensorrt/bin/trtexec --loadEngine=yolov8s-pose-dy.engine
  • or test with multi batch for dynamic shaped onnx model

    • --shapes=spec Set input shapes for dynamic shapes inference inputs.
    /usr/src/tensorrt/bin/trtexec  \
        --loadEngine=yolov8s-pose-dy-sim-640.engine \
        --shapes=images:12x3x640x640 
    
    • Performance on Jetson(AGX Xavier / AGX Orin) for TensorRT Engine
model device size batch fps ms
yolov8s-pose.engine AGX Xavier 640 1 40.6 24.7
yolov8s-pose.engine AGX Xavier 640 12 12.1 86.4
yolov8s-pose.engine AGX Orin 640 1 258.8 4.2
yolov8s-pose.engine AGX Orin 640 12 34.8 33.2
yolov7w-pose.engine* AGX Xavier 960 1 19.0 52.1
yolov7w-pose.engine* AGX Orin 960 1 61.1 16.8
yolov7w-pose.pt AGX Xavier 960 1 14.4 59.8
yolov7w-pose.pt AGX Xavier 960 1 11.8 69.4
  • * yolov7w-pose with yolo layer tensorrt plugin from (nanmi/yolov7-pose).NMS not included。Single batch and image_size 960 only.
  • test .engine(TensorRT) model with trtexec command.
  • test .pt model with Pytorch (with 15s video) for baseline.
  • NMS not included in all test

Basic usage

Download Ripository

git clone https://github.com/YunghuiHsu/deepstream-yolo-pose.git

To run the app with default settings:


  • NVInfer with rtsp inputs

    python3 deepstream_YOLOv8-Pose_rtsp.py \ 
       -i  rtsp://sample_1.mp4 \
           rtsp://sample_2.mp4 \ 
           rtsp://sample_N.mp4  \
  • eg: loop with local file inputs

    python3 deepstream_YOLOv8-Pose_rtsp.py \
        -i file:///home/ubuntu/video1.mp4 file:///home/ubuntu/video2.mp4 \
        -config dstest1_pgie_YOLOv8-Pose_config.txt \
        --file-loop
  • Default RTSP streaming location:

Note:

  1. if -g/--pgie : uses nvinfer as default. (['nvinfer', 'nvinferserver']).
  2. -config/--config-file : need to be provided for custom models.
  3. --file-loop : option can be used to loop input files after EOS.
  4. --conf-thres : Objec Confidence Threshold
  5. --iou-thres : IOU Threshold for NMS

This sample app is derived from NVIDIA-AI-IOT/deepstream_python_apps/apps and adds customization features

  • Includes following :

    • Accepts multiple sources

    • Dynamic batch model(YOLO-POSE)

    • Accepts RTSP stream as input and gives out inference as RTSP stream

    • NVInfer GPU inference engine

    • NVInferserver GPU inference engine(Not yet tested)

    • MultiObjectTracker(NVTracker)

    • Automatically adjusts the tensor shape of the loaded input and output (NvDsInferTensorMeta)

    • Extract the stream metadata, image data from the batched buffer of Gst-nvinfer

      imagedata-app-block-diagram.png

      source : deepstream-imagedata-multistream


Acknowledgements

Reference