Hardware support for inference

Question

Hardware support for inference

Closed this issue 2 years ago · 1 comments

Hi, it is not mentioned in the paper that how to implement the NVIDIA support for the quantized model to inference. Could you please give some explanations? Thks very much!

Answer 1 · 2022-09-18T09:16:00.000Z

Thanks for your interest in our work and sorry for the delayed answer!
Similar to HAQ that utilizes the latency lookup table, we calculate our latency for each image by combining each convolutional layer’s latency (and also that of the bit selector network). You can refer to this project for details on measuring each convolutional layer’s latency for the selected bit-widths.