Hardware support for inference
Closed this issue · 1 comments
Sheldon04 commented
Hi, it is not mentioned in the paper that how to implement the NVIDIA support for the quantized model to inference. Could you please give some explanations? Thks very much!
Cheeun commented
Thanks for your interest in our work and sorry for the delayed answer!
Similar to HAQ that utilizes the latency lookup table, we calculate our latency for each image by combining each convolutional layer’s latency (and also that of the bit selector network). You can refer to this project for details on measuring each convolutional layer’s latency for the selected bit-widths.