depthfirstlearning/depthfirstlearning.com

Vanishing gradients of GANs?

MicPie opened this issue · 0 comments

I am wrapping my head around the explanation of the vanishing gradients problem of GANs for quite some time:

The current solution pdf document plots the input of the sigma functions over the loss function values to explain the (non-)saturating behavior. However, I am asking myself if that plot just captures the saturation of the sigma function and not the saturation behavior of the G loss function itself.

The NIPS 2016 GAN tutorial shows in figure 16 (p.26) an explanation of the saturating loss without taking the sigma function into account. With this explanation, I guess, the saturation behavior is explained through the gradients for G when G is not (yet) able to generate good fakes and D can easily identify them as fake (x = 0 or close to 0).
See a plot of the saturating and non-saturating loss function and their derivations. There, the saturating loss has a small gradient of around -1 and the saturating loss of -infinity at x = 0.
When I plot the gradients over the training for both loss functions I also get higher gradient means and higher standard deviations for the non-saturating loss when compared to the saturating loss (see notebook).

Maybe I am missing something?

I would be happy if somebody could point me in the right direction.