yusuketomoto/chainer-fast-neuralstyle

Why FastStyleNet add w = math.sqrt(2) in ResidualBlock

Opened this issue · 2 comments

Thanks for reading my problem.

When I check the FastStyleNet,I found that the Convolution2D add the w=math.sqrt(2).the code is below:

class ResidualBlock(chainer.Chain):
def init(self, n_in, n_out, stride=1, ksize=3):
w = math.sqrt(2)
super(ResidualBlock, self).init(
c1=L.Convolution2D(n_in, n_out, ksize, stride, 1, w),
c2=L.Convolution2D(n_out, n_out, ksize, 1, 1, w),
b1=L.BatchNormalization(n_out),
b2=L.BatchNormalization(n_out)
)

I have checked the Convolution2D's source code , the parameter w means a scale.

The problem is that I don't know why it is setted to sqrt(2).Could it be 1?

Thanks very much.

http://docs.chainer.org/en/stable/_modules/chainer/links/connection/convolution_2d.html#Convolution2D

wscale is only used for the initializer.

So this w is the scale used for initializing the weights with gaussian noise. So w is used only during initialization and during training and execution of the model it becomes irrelevant. My guess would be that the actually value is more or less empirically chosen as a trade-off between initial noisiness and training time.

If you're willing to wait longer, you could try setting it even lower, so that the NN starts out with a lower response (= more gray) but also with less noise, but then it might take longer for the NN to learn to produce full-amplitude outputs.

@fxtentacle
Thanks a lot.I have understood it.