Question about the Quantization process
JordanChua opened this issue · 2 comments
Hi Lorenzo thanks for the paper it was a good read! I'm actually trying to implement quantization pipeline on a trained model and I was hoping to refer to the compression pipeline you have implemented in this paper, mainly QAT followed by Quantization and Entropy Coding.
I was hoping to get some of your inputs in how I could achieve this! Thanks a lot!
Hello, thanks for your interest! Most of the quantization code can be found in https://github.com/aegroto/nif/blob/master/compression/__init__.py. Basically, the original floating point values are normalized and then quantized to 8-bit integers in the range [-128, 127].
This quantization is used in QAT here
Line 34 in aac23fd
Entropy coding is done by applying brotli on the quantized tensors casted to be numpy int8 arrays:
Line 37 in aac23fd
I hope those tips will help you with your research. Feel free to ask more if you have any doubt.
Thanks a lot for the help Lorenzo! As you have suggested I have managed to make the naive implementation of QAT working and I'm working on the approach u have suggested which takes into account the quantization noise.