ardianumam/Tensorflow-TensorRT

INT8 support

Closed this issue · 5 comments

So I tried using INT8 instead of FP16 for optimizing YOLOv3. Instead of getting a speedup, it was taking 1200+ ms per image.

My environment:
Ubuntu 18.10
Python 3.7.1
CUDA 10.0
cudNN 7.5.0
Tensorflow-gpu 1.13.1
TensorRT 5.0.2.6
GTX 1070

Have you calibrate the graph? In case you haven't, see this link (near end of the article).

Thank you so much. Will give it a shot and update here :)

OKay, I checked out the link. I will prepare a dataset for calibration. Meanwhile, you set the max batch size in create_inference_graph() method. How do we use this batch size during inference?

Nevermind. I figured it out. I froze the graph again with the input tensor shape as [None, 416, 416, 3]. This allows for batch inference.