Is the model really Quantized?
navinranjan7 opened this issue · 3 comments
navinranjan7 commented
Is the model really Quantized?
navinranjan7 commented
- GFLOPs remain the same after quantization.
- Compared to full precision model training, Memory requirements increase with quantization during training.
- The quantized model size is too large. Quantized Swin-T has 380MB compared to 109MB full precision size.
Please help me understand. How did you calculate the GFLOPs, and if the model is really quantized?
spbob0418 commented
same here. I had to reduce batch size, because of the GPU memory limit which is not the case in full precision DeiT model
And I found that the weight and activation scaled right after quantized(why not after multiplication??)
spbob0418 commented
And I cannot find the code of compute entropy in quantization code which is described in paper method