batch norm initialization

Question

batch norm initialization

Closed this issue 6 years ago · 2 comments

In non_local.py, 50 51 line,
nn.init.constant(self.W[1].weight, 0)
why is the weight in batchnorm layer set to zero initialization?
I worry that they will compute the same gradients during backpropagation and undergo the exact same parameter updates...

Answer 1 · 2018-04-19T07:55:39.000Z

read the paper 4.1

Answer 2 · 2018-06-05T10:52:44.000Z

Thanks for @appleleaves .

The paper 4.1 saids The scale parameter of this BN layer is initialized as zero. Thisensures that the initial state of the entire non-local block is an identity mapping, so it can be inserted into any pre-trained
networks while maintaining its initial behavior.