Quantization
zilunpeng opened this issue · 1 comments
zilunpeng commented
Could you share some more information on how you quantize the model? Did you use any packages for quantization?
Michaelvll commented
Sorry for the late reply. We did not use additional packages for quantization. For simplicity, we manually read the pytorch checkpoint of the trained model and applied the kmeans quantization to the model weight, which maps the floating point weight to 8-bit int and re-mapped it back to float for inference (with precision loss).