/yolov5-triton

YOLO v5 Object Detection on Triton Inference Server

Primary LanguagePython

yolov5-triton

YOLO v5 Object Detection on Triton Inference Server

Table of Contents

What does this application do?

This application demonstrates the following things.

  • How to prepare TensorRT model for NVIDIA Triton Inference Server
  • How to launch NVIDIA Triton Inference Server
  • How to form a pipeline with the model ensemble
  • How to implement client applications for Triton Inference Server

Model Pipeline

The below pipeline is formed with the model ensemble.

Order Model Name Backend Input
Type
Input
Dimension
Output
Type
Output
Dimension
Description
1 preprocess Python UINT8 [3, 384, 640] FP32 [3, 384, 640] Type Conversion
Normalization
2 yolov5s_trt TensorRT FP32 [3, 384, 640] FP32 [15120, 85] Object Detection
3 postprocess Python FP32 [15120, 85] FP32 [1, -1, 6] Bounding Box Generation
Non-Maximum Suppression

The pipeline output [1, -1, 6] consists of 1 * N * [x0, y0, x1, y1, score, class].
N : The number of the detected bounding boxes
(x0, y0) : The coordinate of the top-left corner of the detected bounding box
(x1, y1) : The coordinate of the bottom-right corner of the detected bounding box

Prerequisites

Server

  • Jetson Xavier/Orin or x86_64 Linux with NVIDIA GPU
  • For Jetson, JetPack 5.0.2 or later
  • For x86_64, NGC account

Client

  • Linux(x86_64/ARM64) or Windows(x86_64)
    No GPU resource needed for client

Server Installation (for Jetson)

  1. Clone this repository

    git clone https://github.com/MACNICA-CLAVIS-NV/yolov5-triton
    cd yolov5-triton/server
  2. Launch PyTorch container

    ./torch_it.sh
  3. Obtain YOLO v5 ONNX model

    pip3 install -U \
    	'protobuf<4,>=3.20.2' \
    	numpy \
    	onnx \
    	pandas \
    	PyYAML \
    	tqdm \
    	matplotlib \
    	seaborn \
    	psutil \
    	gitpython \
    	scipy \
    	setuptools
    python3 torch2onnx.py yolov5s
  4. Covert ONNX model to TensorRT engine

    /usr/src/tensorrt/bin/trtexec \
    	--onnx=yolov5s.onnx \
    	--saveEngine=model.plan \
    	--workspace=4096 \
    	--exportProfile=profile.json
  5. Copy TensorRT engine to model repository

    cp model.plan ./model_repository/yolov5s_trt/1/
  6. Exit from PyTorch container

    exit
  7. Build a docker image for Triton Inference Server

    ./triton_build.sh

Server Installation (for x86_64)

Need NGC account

  1. Clone this repository

    git clone https://github.com/MACNICA-CLAVIS-NV/yolov5-triton
    cd yolov5-triton/server
  2. Launch PyTorch container

    ./torch_it_x86.sh
  3. Obtain YOLO v5 ONNX model

    pip3 install \
    	protobuf \
    	pandas \
    	PyYAML \
    	tqdm \
    	matplotlib \
    	seaborn \
    	gitpython
    python3 torch2onnx.py yolov5s
  4. Covert ONNX model to TensorRT engine

    /usr/src/tensorrt/bin/trtexec \
    	--onnx=yolov5s.onnx \
    	--saveEngine=model.plan \
    	--workspace=4096 \
    	--exportProfile=profile.json
  5. Copy TensorRT engine to model repository

    cp model.plan ./model_repository/yolov5s_trt/1/
  6. Exit from PyTorch container

    exit

Run Server (for Jetson)

sudo jetson_clocks
./triton_start_grpc.sh

Run Server (for x86_64)

./triton_start_grpc_x86.sh

Install Client

The client application does not need GPU resource. It can be deployed to Windows/Linux without GPU card. Virtual python environment like conda or venv is recommened.

  1. Clone this repository

    git clone https://github.com/MACNICA-CLAVIS-NV/yolov5-triton
    cd yolov5-triton/client
  2. Install Python dependencies

    pip install tritonclient[all] Pillow opencv-python

Run Client

Image Input Inference

python infer_image.py [-h] [--url SERVER_URL] IMAGE_FILE

Example:

python infer_image.py --url localhost:8000 test.jpg

Camera Input Inference

python infer_camera.py [-h] [--camera CAMERA_ID] [--width CAPTURE_WIDTH] [--height CAPTURE_HEIGHT] [--url SERVER_URL]

Example:

python infer_camera.py --camera 1 --width 640 --height 480 --url 192.168.XXX.XXX:8000