mit-han-lab/lite-transformer

Quantization

zilunpeng opened this issue · 1 comments

Could you share some more information on how you quantize the model? Did you use any packages for quantization?

Sorry for the late reply. We did not use additional packages for quantization. For simplicity, we manually read the pytorch checkpoint of the trained model and applied the kmeans quantization to the model weight, which maps the floating point weight to 8-bit int and re-mapped it back to float for inference (with precision loss).