Is the model really Quantized?

Question

Is the model really Quantized?

navinranjan7 opened this issue 2 years ago · 3 comments

navinranjan7 commented 2 years ago

Answer 1 · 2023-05-27T23:19:06.000Z

GFLOPs remain the same after quantization.
Compared to full precision model training, Memory requirements increase with quantization during training.
The quantized model size is too large. Quantized Swin-T has 380MB compared to 109MB full precision size.

Please help me understand. How did you calculate the GFLOPs, and if the model is really quantized?

Answer 2 · 2024-10-24T06:27:01.000Z

same here. I had to reduce batch size, because of the GPU memory limit which is not the case in full precision DeiT model
And I found that the weight and activation scaled right after quantized(why not after multiplication??)

Answer 3 · 2024-10-24T06:28:53.000Z

And I cannot find the code of compute entropy in quantization code which is described in paper method