Why FastStyleNet add w = math.sqrt(2) in ResidualBlock
Opened this issue · 2 comments
Thanks for reading my problem.
When I check the FastStyleNet,I found that the Convolution2D add the w=math.sqrt(2).the code is below:
class ResidualBlock(chainer.Chain):
def init(self, n_in, n_out, stride=1, ksize=3):
w = math.sqrt(2)
super(ResidualBlock, self).init(
c1=L.Convolution2D(n_in, n_out, ksize, stride, 1, w),
c2=L.Convolution2D(n_out, n_out, ksize, 1, 1, w),
b1=L.BatchNormalization(n_out),
b2=L.BatchNormalization(n_out)
)
I have checked the Convolution2D's source code , the parameter w means a scale.
The problem is that I don't know why it is setted to sqrt(2).Could it be 1?
Thanks very much.
wscale
is only used for the initializer.
So this w
is the scale used for initializing the weights with gaussian noise. So w is used only during initialization and during training and execution of the model it becomes irrelevant. My guess would be that the actually value is more or less empirically chosen as a trade-off between initial noisiness and training time.
If you're willing to wait longer, you could try setting it even lower, so that the NN starts out with a lower response (= more gray) but also with less noise, but then it might take longer for the NN to learn to produce full-amplitude outputs.
@fxtentacle
Thanks a lot.I have understood it.