batch norm initialization
Closed this issue · 2 comments
wdczs commented
In non_local.py, 50 51 line,
nn.init.constant(self.W[1].weight, 0)
why is the weight in batchnorm layer set to zero initialization?
I worry that they will compute the same gradients during backpropagation and undergo the exact same parameter updates...
appleleaves commented
read the paper 4.1
AlexHex7 commented
Thanks for @appleleaves .
The paper 4.1 saids The scale parameter of this BN layer is initialized as zero. Thisensures that the initial state of the entire non-local block is an identity mapping, so it can be inserted into any pre-trained
networks while maintaining its initial behavior.