
Got 100fps on TX2. Got 500fps on GeForce GTX 1660 Ti. Implement yolov4-tiny-tensorrt layer by layer using TensorRT API. If the project is useful to you, please Star it.

Primary LanguageC++



  • This project is based on wang-xinyu/tensorrtx and WongKinYiu/PyTorch_YOLOv4. The project has been tested on TensorRT 7.0 CUDA 10.2 CUDNN 7.6.5, and costs about 2ms(500fps) to inference an image on GeForce GTX 1660 Ti.

  • The project also has been tested on TensorRT 7.1.0(Developer Preview) CUDA 10.2 CUDNN 8.0.0(Developer Preview), and costs about 10-11ms(90-100fps) to inference an image on TX2 (by using the MAX-N mode and jetson_clocks).

  • There is another branch "trt5" for TensorRT 4 & 5.


(1) Generate yolov4-tiny.wts from pytorch implementation

  git clone -b master https://github.com/WongKinYiu/PyTorch_YOLOv4.git

// Download yolov4-tiny.pt and copy it into PyTorch_YOLOv4/weights.

// 权重下载链接:https://pan.baidu.com/s/1lEXCyDJyjL9B0WR-MKzAeg 提取码:ml0o

  git clone https://github.com/tjuskyzhang/yolov4-tiny-tensorrt.git

  cd PyTorch_YOLOv4

  cp ../yolov4-tiny-tensorrt/gen_wts.py .

  python gen_wts.py weights/yolov4-tiny.pt

// A file named 'yolov4-tiny.wts' will be generated.

  cp yolov4-tiny.wts ../yolov4-tiny-tensorrt

(2) Build and run

  cd yolov4-tiny-tensorrt

  mkdir build

  cd build

  cmake ..


// Serialize the model and generate yolov4-tiny.engine

  ./yolov4-tiny -s

// Deserialize and generate the detection results _dog.jpg and so on.

  ./yolov4-tiny -d ../samples