Why d_loss = 0.5 * np.add(d_loss_real, d_loss_fake) ?
mrgloom opened this issue · 7 comments
I wonder why it's d_loss = 0.5 * np.add(d_loss_real, d_loss_fake)
and not d_loss = np.add(d_loss_real, d_loss_fake)
?
https://github.com/eriklindernoren/Keras-GAN/blob/master/gan/gan.py#L123
Loss from real and fake images are averaged. CMIIW, I guess this is how loss is calculated in the paper but if we just sum the loss then also we might get the same result.
Discriminator use BCE loss:
BCE = - y * log(y_pred) - (1 - y) * log(1 - y_pred)
as I understand we can rewrite this code https://github.com/eriklindernoren/Keras-GAN/blob/master/gan/gan.py#L121-L122 as d_loss = self.discriminator.train_on_batch([gen_imgs, imgs], [fake, valid])
, by [gen_imgs, imgs]
here I mean concatenate.
I guess this is how loss is calculated in the paper
Can you point out where it's specified in the paper?
I found this comment in pix2pix paper In addition, we divide the objective by 2 while optimizing D, which slows down the rate at which D learns relative to G.
and in cyclegan paper In practice, we divide the objective by 2 while optimizing D, which slows down the rate at which D learns, relative to the rate of G.
Actually I have tested it without 0.5 on simple dataset like parabola from here and it still works.
Actually I was not able to break it even with d_loss = 100000.0 * np.add(d_loss_real, d_loss_fake)
, as I understand in this keras code it's not affect the training procedure, but just just averages metrics for display.
Multiplying loss with a constant will have same effect as that of a learning rate. While training a GAN we two try that both generator and discriminator learns at the same pace.
I mean as I understand in Keras if you want to apply weight to loss you should use loss_weights
in compile
https://github.com/eriklindernoren/Keras-GAN/blob/master/gan/gan.py#L29-L31
loss_weights: Optional list or dictionary specifying scalar
coefficients (Python floats) to weight the loss contributions
of different model outputs.
The loss value that will be minimized by the model
will then be the *weighted sum* of all individual losses,
weighted by the `loss_weights` coefficients.
If a list, it is expected to have a 1:1 mapping
to the model's outputs. If a tensor, it is expected to map
output names (strings) to scalar coefficients.
but if you already get metrics, you just multiply them by constant here https://github.com/eriklindernoren/Keras-GAN/blob/master/gan/gan.py#L123 and it's not affecting the training process, I mean this multiplication is not 'part of the graph' like for example in tensorflow.
I come here for the same question.