for GAN ,why my D loss is increse,and G loss decrease to 0 at the begining
hefeiwangyande opened this issue · 10 comments
The generated picture is noise.
step: 4650,G_loss_adv: 0.325, G_accuracy: 0.984,
D_loss_adv: 0.982, d_loss_pos: 0.598, d_loss_neg: 1.366,
D_accuracy: 0.258, d_pos_acc: 0.500, d_neg_acc: 0.016
my G_loss less than D_loss and generated samples score significantly higher than the real picture, D is completely abnormal (normal D_loss is small, D can distinguish true and false it?), my D structure using four convs + fullly connected, I do not know Why do you make a mistake?
Please fix your message, it is not readable.
And I doubt anyone will spend hours trying to debug your code, please come with a precise question.
G_loss_adv = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=d_fake_logit, labels=tf.ones_like(d_fake_logit)), name='g_loss')
d_loss_pos = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=d_real_logit, labels=tf.ones_like(d_real_logit)), name='d_loss_real')
d_loss_neg = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=d_fake_logit, labels=tf.zeros_like(d_fake_logit)), name='d_loss_fake')
D_loss_adv = tf.add(.5 * d_loss_pos, .5 * d_loss_neg, name='d_loss')
# about accuracy
d_pos_acc = tf.reduce_mean(tf.cast(score_real > 0.5, tf.float32), name='accuracy_real')
d_neg_acc = tf.reduce_mean(tf.cast(score_fake < 0.5, tf.float32), name='accuracy_fake')
d_accuracy =tf.add(.5 * d_pos_acc, .5 * d_neg_acc, name='accuracy')
g_accuracy = tf.reduce_mean(tf.cast(score_fake > 0.5, tf.float32), name='accuracy')
In your implementation is looks like d_loss_fake should be different from g_loss_adv.
Assuming that :
1) G is the Generator outputting a fake image from a noise vector z
2) D is the discriminator and that it outputs the probability that the input is real:
one gets:
g_loss_adv = D(G(z)) and d_loss_fake = 1 - D(G(z))
@DEKHTIARJonathan Thanks for your suggestion, I have changed the information.
@rafaelvalle Your suggestion is:
d_loss_neg = tf.reduce_mean(1-tf.nn.sigmoid_cross_entropy_with_logits(
logits=d_fake_logit, labels=tf.zeros_like(d_fake_logit)), name='d_loss_fake')
Actually, d_fake_logit=D(G(z)) in my implemented, Input noise z through by D, its value should be relatively small close to 0, so I think d_loss_neg is not wrong or my understanding wrong?
Let's look at the positive (real) and negative (adversarial) losses one by one. Consider D outputs the probability of the input being real.
A) if d_loss_pos is minimized using D(x) and labels for x are 1, D minimizes its loss by trying to make D(x) closer to 1.
B) if d_loss_neg is minimized D(G(z)) and labels for labels for G(z) are 0, D minimizes its loss by trying to make (D(G(z)) closer to 0.
Your problem could be that
1) the labels for x and G(z) are the same instead of 1 and 0 respectively.
2) If that's not the problem, it could be that using D(G(z)) has vanishing gradients early on and people prefer to use 1 - D(G(z)).
Now let's assume you have 1 and 2 correct, where else could the problem be?
Look that in your code below the generator and the discriminator have the same function to minimize. This is not correct as they should minimize different loss functions. That's why I suggested chancing g_loss_adv to 1 - d_fake_logit
g_loss_adv = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=d_fake_logit, labels=tf.ones_like(d_fake_logit)), name='g_loss')
d_loss_neg = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=d_fake_logit, labels=tf.zeros_like(d_fake_logit)), name='d_loss_fake')
@rafaelvalle I'm a little bit understand your meaning, you mean that my D (G (Z)) is too large to reduce may be due to the disappearance of the gradient, so choose to maximize the 1-D (G (Z))?
In addition, d_loss_neg, g_loss_adv parameters are not exactly the same
g_loss_adv = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=d_fake_logit, labels=tf.ones_like(d_fake_logit)), name='g_loss')
d_loss_neg = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=d_fake_logit, labels=tf.zeros_like(d_fake_logit)), name='d_loss_fake')
Oh, I missed the ones_like, zeros_like! Sorry for not reading carefully.
There are many things that could be the reason :
- Loss function
Try using 1 - D(G(z)) instead - Time
Wait for a few iterations until training converges to a specific behavior, for example generator always wins. - Learning rates
Try adjusting such that the loosing part has higher learning rate - Discriminator number of iterations vs Generator number of iterations
Try adjusting such that the loosing part has more iterations - Weight initialization
Try Xavier Uniform with gain set according to non-linearity - Noise vector
Try using uniform noise instead of normal noise - Model capacity
Try increasing the loosing part's capacity
Report here if you find what the problem was such that we all learn.
I can also use: Label Smoothing for the discriminator to weaken it.
I have submit a PR request on TF to help you implementing this feature: tensorflow/tensorflow#16153
You can get inspiration to write your own custom code
I encounter the same issue, and finally found that I ignore to use ONLY the gradient of each discriminator part or generator part. If you don't add params like var_list=generator_vars
, the optimizer will weaken discriminator's ability by update it's parameters.
discriminator_vars = [var for var in tf.global_variables() if "discriminator" in var.name]
generator_vars = [var for var in tf.global_variables() if "generator" in var.name]
self.D_optimizer = tf.train.AdamOptimizer(learning_rate=2e-4).minimize(self.D_loss, var_list=discriminator_vars)
self.G_optimizer = tf.train.AdamOptimizer(learning_rate=2e-4).minimize(self.G_loss, var_list=generator_vars)