AlexHex7/Non-local_pytorch

batch norm initialization

Closed this issue · 2 comments

wdczs commented

In non_local.py, 50 51 line,
nn.init.constant(self.W[1].weight, 0)
why is the weight in batchnorm layer set to zero initialization?
I worry that they will compute the same gradients during backpropagation and undergo the exact same parameter updates...

read the paper 4.1

Thanks for @appleleaves .

The paper 4.1 saids The scale parameter of this BN layer is initialized as zero. Thisensures that the initial state of the entire non-local block is an identity mapping, so it can be inserted into any pre-trained
networks while maintaining its initial behavior.