INT8 support
Closed this issue · 5 comments
kingardor commented
So I tried using INT8 instead of FP16 for optimizing YOLOv3. Instead of getting a speedup, it was taking 1200+ ms per image.
My environment:
Ubuntu 18.10
Python 3.7.1
CUDA 10.0
cudNN 7.5.0
Tensorflow-gpu 1.13.1
TensorRT 5.0.2.6
GTX 1070
ardianumam commented
Have you calibrate the graph? In case you haven't, see this link (near end of the article).
kingardor commented
Thank you so much. Will give it a shot and update here :)
kingardor commented
OKay, I checked out the link. I will prepare a dataset for calibration. Meanwhile, you set the max batch size in create_inference_graph() method. How do we use this batch size during inference?
kingardor commented
Thanks buddy. Checked it out.
Also, I wanted help regarding batch inference. You mentioned max batches in
the create inference graph method(). How do I feed a batch of images to the
model?
…On Wed, 3 Apr 2019 at 06:32, Ardian Umam ***@***.***> wrote:
Have you calibrate the graph? In case you haven't, see this link
<https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/>
(near end of the article).
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#9 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AUN02WCrbnHAU7uMHKeLrXfZezVHERWkks5vc_3CgaJpZM4cY48k>
.
kingardor commented
Nevermind. I figured it out. I froze the graph again with the input tensor shape as [None, 416, 416, 3]. This allows for batch inference.