how to get the int8 model?
Closed this issue · 2 comments
Thank you for your work. After training quantized model, the inference speed of quantized model is still slow. How can i get the faster int8 model ? use tensorRT? Can you talk about the next process?
Hi, SimpleDet only provide the simulated quantization method which was proposed by [1].
After training, you can get the quantization range of each layers.
For faster inference, you should use TensorRT or TVM, and adopt the quantization range of each layers obtained from int8 training.
[1] Jacob, Benoit, et al. "Quantization and training of neural networks for efficient integer-arithmetic-only inference." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
Is it necessary to convert mxnet model to ONNX format, I cannot convert directly. Could you please give me some advice?thank you so much!