pulp-platform/nemo

Is bias quantized during fake quantization

Msabih opened this issue · 1 comments

Hi,

I printed the unique values of weights tensor and bias from PACT Conv2d to understand the working and it appears as if the bias values are not quantized. Also what if the bias values are in BatchNorm layer ( as it is common in many PyTorch models), instead of Conv layer then how are those bias values used in PACT BatchNorm ?

Hi @Msabih, the bias values in PACT_Conv2d are not quantized. In fact, in FakeQuantized stage, you only constrain the representation of weights of ConvNd / Linear layers and of the output of the activation functions.
Usually we "unfold" biases with the following BatchNorm layer, which are quantized (without any retraining) in the following stage, Quantized/Deployable. Take in mind that in general while you can quantize weights to low bitwidth, BN params are quantized with more relaxed bitwidths (32 bits). The reason for this choice is twofold: 1) lowering the bitwidth of BN parameters would require fine-tuning the network with the same tricks we use for Conv weights - but at the same time in our experience keeping the BN real-valued when fine-tuning actually boosts accuracy; 2) the hardware advantage in quantizing BN params (and therefore also biases) to low bitwidth is quite reduced with respect to the accuracy loss.
I put more details on the computational model we use here: https://arxiv.org/abs/2004.05930

FYI, in NEMO v0.0.7 there seems to be some issue with biases when we do BN folding instead of keeping BNs. I am investigating this latter behavior