INT8 support

Question

INT8 support

Closed this issue 6 years ago · 5 comments

So I tried using INT8 instead of FP16 for optimizing YOLOv3. Instead of getting a speedup, it was taking 1200+ ms per image.

My environment:
Ubuntu 18.10
Python 3.7.1
CUDA 10.0
cudNN 7.5.0
Tensorflow-gpu 1.13.1
TensorRT 5.0.2.6
GTX 1070

Answer 1 · 2019-04-03T01:02:58.000Z

Have you calibrate the graph? In case you haven't, see this link (near end of the article).

Answer 2 · 2019-04-03T03:52:48.000Z

Thank you so much. Will give it a shot and update here :)

Answer 3 · 2019-04-03T04:35:31.000Z

OKay, I checked out the link. I will prepare a dataset for calibration. Meanwhile, you set the max batch size in create_inference_graph() method. How do we use this batch size during inference?

Answer 4 · 2019-04-03T05:49:28.000Z

Thanks buddy. Checked it out. Also, I wanted help regarding batch inference. You mentioned max batches in the create inference graph method(). How do I feed a batch of images to the model?

…

On Wed, 3 Apr 2019 at 06:32, Ardian Umam ***@***.***> wrote: Have you calibrate the graph? In case you haven't, see this link <https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/> (near end of the article). — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AUN02WCrbnHAU7uMHKeLrXfZezVHERWkks5vc_3CgaJpZM4cY48k> .

Answer 5 · 2019-04-04T16:02:04.000Z

Nevermind. I figured it out. I froze the graph again with the input tensor shape as [None, 416, 416, 3]. This allows for batch inference.