Undocumented difference to paper
Opened this issue · 9 comments
Hi and thanks for this easy-to-use implementation :)
I was trying to reproduce the original paper's results with your implementation and found the following differences. For the convolution blocks, the original paper states:
In these tables “C × H × W conv” denotes a convolutional layer with C filters size H × W which is immediately followed by spatial batch normalization [1] and a ReLU nonlinearity."
It seems that your implementation uses
h = self.b1(F.elu(self.c1(x)), test=test)
whereas the paper would suggest
h = F.relu(self.b1(self.c1(x), test=test))
Also, the paper adds a 40px padding before the first convolution layer, and then doesn't have any padding for the convolutions and residuals. That way, they implicitly cut off the borders of the generated image, thus removing border artifacts.
My suggestion would be to add these differences to the information in your readme.
Cheers,
Hajo
@fxtentacle
Thanks for your suggestion.
I just want to know, does this suggestion can remove the 'dots' and noise? I am eagerly solving this problem.Have you did some experiments?
Thank you with all my love.
In my testing, the early normalization reduces noisy overexposure a bit, but it didn't help with the dots.
My working guess is that the dots are either the result of interlacing between the convolution stride and kernel size, or caused by having too little variance penalty during training. But so far, I haven't found a solution yet. I'll report back when (or rather if) I figure it out.
@fxtentacle
Thanks very much.
I am training the model with your suggestion now and hoping it can reduce noisy overexposure.
I am looking forward for your better solution.
Thanks really.
@fxtentacle Thank you for your suggestion. You are right. I didn't notice my mistake of the late batch norm. I'm thinking whether to change code since it breaks backward compatibility again.
I agree that the normalization order is probably not important enough to warrant breaking old models. But my hope is that we'll also be able to figure out the dot issue and I think that one will need re-training anyway.
I ran some tests with size 3 convolution kernels (like in the paper) but that doesn't affect the dots much.
Training with extreme images (blank/stripes) for the style shows that the dots are dependent on the style, so my current working guess is that the dots have somehow been trained into the models.
I'm now running some tests on my cluster where I use various edge detection kernels as regularization methods. I'll evaluate around epoch 1 which should be ready in 2 hours.
BTW I have to say that after working on business apps for a while, I really enjoy the immediate feedback that you get with image algorithms :)
@fxtentacle please see also #68 where dot issue is discussed.
Another difference to the paper seems to be that they add reflection padding before starting the processing chain and then crop the image before VGG evaluation. That way, the border artifacts introduced by padding are not included in the loss calculation.
I finally figured out where the border artifacts come from: The original paper also uses reflection padding inside the residue blocks, this implementation uses zero-padding. As such, the residue blocks in this implementation will always introduce 0-values around the border, which leads to the artifacts. Now if only Chainer had reflection padding ...
@fxtentacle yes, you are right. I added quick-fix for dot noise/border artifact problem. 431af84