YanjingLi0202/Q-ViT

Is the model really Quantized?

navinranjan7 opened this issue · 3 comments

Is the model really Quantized?
  1. GFLOPs remain the same after quantization.
  2. Compared to full precision model training, Memory requirements increase with quantization during training.
  3. The quantized model size is too large. Quantized Swin-T has 380MB compared to 109MB full precision size.

Please help me understand. How did you calculate the GFLOPs, and if the model is really quantized?

same here. I had to reduce batch size, because of the GPU memory limit which is not the case in full precision DeiT model
And I found that the weight and activation scaled right after quantized(why not after multiplication??)

And I cannot find the code of compute entropy in quantization code which is described in paper method