Vanishing gradients?
netheril96 opened this issue · 1 comments
The derivative of sigmoid is very small when the scores are away from zero, which is why sigmoid activation has all but abandoned in deep learning. In the original GAN, the logarithm of sigmoid is used as the loss function, and the derivative of logarithm is large enough to cancel out the vanishing gradient issue.
If I get it right, in LSGAN the loss for the generator is the squared sum of 1 minus sigmoid of discriminator output. That seems to suffer from vanishing gradients as well, and indeed in my cursory experiments the network loss never goes down. How do you overcome that?
Quite late to the party I have to admit but I will explain it for anyone else watching the thread, since even Google Search is showing your issue.
In LSGANs the output layer is linear. As a result the vanishing gradient issue is mitigated.