
compile yolov3 in TVM

Primary LanguagePython

Compile darknet on tvm

This is a demo of yolov3 on TVM.

Environments Setup

  1. Install TVM

    1. Requirements
    sudo apt-get update 
    sudo apt-get install -y python3 python3-dev python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev
    1. Download llvm Pre-built Binary from here (depends on your OS)
    unzip llvm directory under tvm-yolov3/
    1. Compile (modify build/cmake.config if needed)
    cd build/ && cmake ..
    make -j8
    1. Python Package Installation
    export TVM_HOME=/path/to/tvm-yolov3
    export PYTHONPATH=$TVM_HOME/python:$TVM_HOME/topi/python:${PYTHONPATH}
    1. Install Python Dependencies

    pip install -r requirements.txt

    for other TVM intallation issues please refer to the website

  2. Prepare Data

    1. Download yolov3 weights and unzip it under tvm-yolov3/

Run and Testing

import tvm.relay.frontend.yolov3 as yolov3
import cv2 
import numpy as np

test_image = 'test.jpg'
imagex = cv2.imread(test_image)
imagex = np.array(imagex)

config = { 
    'img': imagex,
    'cfg_path': 'yolov3.cfg',
    'weights_path': 'yolov3.weights',
    'device_type': 'cuda-cudnn', #cpu, cuda, cuda-cudnn
    'autotune': True,
    'log_file': 'yolov3_auto.log',
    'thresh': 0.5,
    'nms_thresh': 0.45

dets = yolov3.run(config)
  • Sample Output: (bbox coordinates with confidences and label)
#[ [class, left, top, right, bottom],     # object 1
#  [class, left, top, right, bottom],     # object 2
#  ... ]
[[60, 0, 180, 825, 691], [39, 464, 190, 558, 443], [39, 274, 129, 389, 462], [39, 213, 130, 300, 374], [39, 10, 95, 140, 409]]

!!! The fastest method is cuda with autotuning acceleration while you have to run python autotuning.py first to generate the log file.

!!! It takes times.

  • Autotuning

python autotuning.py

Extract tasks...
[Task  1/12]  Current/Best:  598.05/2497.63 GFLOPS | Progress: (252/252) | 1357.95 s Done.
[Task  2/12]  Current/Best:  522.63/2279.24 GFLOPS | Progress: (784/784) | 3989.60 s Done.
[Task  3/12]  Current/Best:  447.33/1927.69 GFLOPS | Progress: (784/784) | 3869.14 s Done.
[Task  4/12]  Current/Best:  481.11/1912.34 GFLOPS | Progress: (672/672) | 3274.25 s Done.
[Task  5/12]  Current/Best:  414.09/1598.45 GFLOPS | Progress: (672/672) | 2720.78 s Done.
[Task  6/12]  Current/Best:  508.96/2273.20 GFLOPS | Progress: (768/768) | 3718.75 s Done.
[Task  7/12]  Current/Best:  469.14/1955.79 GFLOPS | Progress: (576/576) | 2665.67 s Done.
[Task  8/12]  Current/Best:  230.91/1658.97 GFLOPS | Progress: (576/576) | 2435.01 s Done.
[Task  9/12]  Current/Best:  487.75/2295.19 GFLOPS | Progress: (648/648) | 3009.95 s Done.
[Task 10/12]  Current/Best:  182.33/1734.45 GFLOPS | Progress: (360/360) | 1755.06 s Done.
[Task 11/12]  Current/Best:  372.18/1745.15 GFLOPS | Progress: (360/360) | 1684.50 s Done.
[Task 12/12]  Current/Best:  215.34/2271.11 GFLOPS | Progress: (400/400) | 2128.74 s Done.
Evaluate inference time cost...
Mean inference time (std dev): 3.16 ms (0.03 ms)

Results: (RTX 2080 Ti)

Darknet TVM AutoTVM
cuda10.2 ~300ms ~170ms 7~8ms
cuda10.2+cudnn7 ~13ms 8~9ms -

