
Question regarding layer equalization

I noticed that you adjusted weight and bias of batchnorm layer by multiplying them by s


Since layer equalization happens after batch norm folding, the weight and bias in conv layer should already be updated by folding batchnorm's parameters. I am wondering why it is still necessary to update batchnorm parameters (multiplying them by s) here. Thanks a lot!

The bn_weight here is actually not involved in the computation graph. It's just a vector to keep track of the output feature mean (fake_bias) and std (fake_weight), which are used to compute the value range for feature quantization.

