Feasible loss simplification?
florian-boehm opened this issue · 2 comments
florian-boehm commented
Hello, I wonder if the following simplifications lead in practice to the same result as the original loss functions:
- Use inverted labels everywhere such that D(x) = 0 and D(G(z)) = 1 in the optimal case.
- Drop the logarithm and even the "+1" could be dropped since constants do not affect the gradient:
- Then one can directly use a gradient descent optimizer to do:
- And also the generator loss would look like this and can be minimized for training the generator:
- If we assume that the discriminator is good enough and will produce an output close to 1 for D(G(z)) then vanishing gradients should be no problem, correct?
Thank you very much for your help!
Florian
ljuvela commented
That starts to look a lot like Wasserstein GAN (see e.g https://arxiv.org/abs/1704.00028). They also propose additional loss terms to limit the gradient magnitudes in D.
florian-boehm commented
Thank you for pointing this paper out to me. If I have understood it correctly, this one point is worth mentioning:
In case of WGAN the activation function in the last layer of the discriminator should be linear and because the output can not be interpreted as probability anymore the discriminator is then called critic.