for GAN ，why my D loss is increse，and G loss decrease to 0 at the begining

Question

for GAN ，why my D loss is increse，and G loss decrease to 0 at the begining

hefeiwangyande opened this issue 7 years ago · 10 comments

The generated picture is noise.
step: 4650,G_loss_adv: 0.325, G_accuracy: 0.984,
D_loss_adv: 0.982, d_loss_pos: 0.598, d_loss_neg: 1.366,
D_accuracy: 0.258, d_pos_acc: 0.500, d_neg_acc: 0.016
my G_loss less than D_loss and generated samples score significantly higher than the real picture, D is completely abnormal (normal D_loss is small, D can distinguish true and false it?), my D structure using four convs + fullly connected, I do not know Why do you make a mistake?

Answer 1 · 2018-01-12T13:43:56.000Z

Please fix your message, it is not readable.

And I doubt anyone will spend hours trying to debug your code, please come with a precise question.

Answer 2 · 2018-01-13T01:31:14.000Z

    G_loss_adv = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
            logits=d_fake_logit, labels=tf.ones_like(d_fake_logit)), name='g_loss')

    d_loss_pos = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
            logits=d_real_logit, labels=tf.ones_like(d_real_logit)), name='d_loss_real')
    d_loss_neg = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(        
     logits=d_fake_logit, labels=tf.zeros_like(d_fake_logit)), name='d_loss_fake')
    D_loss_adv = tf.add(.5 * d_loss_pos, .5 * d_loss_neg, name='d_loss')

    # about accuracy
    d_pos_acc = tf.reduce_mean(tf.cast(score_real > 0.5, tf.float32), name='accuracy_real')
    d_neg_acc = tf.reduce_mean(tf.cast(score_fake < 0.5, tf.float32), name='accuracy_fake')
    d_accuracy =tf.add(.5 * d_pos_acc, .5 * d_neg_acc, name='accuracy')

    g_accuracy = tf.reduce_mean(tf.cast(score_fake > 0.5, tf.float32), name='accuracy')

Answer 3 · 2018-01-13T01:40:30.000Z

In your implementation is looks like d_loss_fake should be different from g_loss_adv.

Assuming that :
1) G is the Generator outputting a fake image from a noise vector z
2) D is the discriminator and that it outputs the probability that the input is real:

one gets:
g_loss_adv = D(G(z)) and d_loss_fake = 1 - D(G(z))

Answer 4 · 2018-01-13T02:11:52.000Z

@DEKHTIARJonathan Thanks for your suggestion, I have changed the information.

Answer 5 · 2018-01-13T02:20:28.000Z

@rafaelvalle Your suggestion is：
d_loss_neg = tf.reduce_mean(1-tf.nn.sigmoid_cross_entropy_with_logits(
logits=d_fake_logit, labels=tf.zeros_like(d_fake_logit)), name='d_loss_fake')

Actually, d_fake_logit=D(G(z)) in my implemented, Input noise z through by D, its value should be relatively small close to 0, so I think d_loss_neg is not wrong or my understanding wrong?

Answer 6 · 2018-01-15T16:36:13.000Z

Let's look at the positive (real) and negative (adversarial) losses one by one. Consider D outputs the probability of the input being real.

A) if d_loss_pos is minimized using D(x) and labels for x are 1, D minimizes its loss by trying to make D(x) closer to 1.
B) if d_loss_neg is minimized D(G(z)) and labels for labels for G(z) are 0, D minimizes its loss by trying to make (D(G(z)) closer to 0.

Your problem could be that
1) the labels for x and G(z) are the same instead of 1 and 0 respectively.
2) If that's not the problem, it could be that using D(G(z)) has vanishing gradients early on and people prefer to use 1 - D(G(z)).

Now let's assume you have 1 and 2 correct, where else could the problem be?
Look that in your code below the generator and the discriminator have the same function to minimize. This is not correct as they should minimize different loss functions. That's why I suggested chancing g_loss_adv to 1 - d_fake_logit

    g_loss_adv = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
            logits=d_fake_logit, labels=tf.ones_like(d_fake_logit)), name='g_loss')
    d_loss_neg = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(        
     logits=d_fake_logit, labels=tf.zeros_like(d_fake_logit)), name='d_loss_fake')

Answer 7 · 2018-01-16T02:27:22.000Z

@rafaelvalle I'm a little bit understand your meaning, you mean that my D (G (Z)) is too large to reduce may be due to the disappearance of the gradient, so choose to maximize the 1-D (G (Z))?

In addition, d_loss_neg, g_loss_adv parameters are not exactly the same
g_loss_adv = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=d_fake_logit, labels=tf.ones_like(d_fake_logit)), name='g_loss')
d_loss_neg = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
logits=d_fake_logit, labels=tf.zeros_like(d_fake_logit)), name='d_loss_fake')

Answer 8 · 2018-01-16T16:54:14.000Z

Oh, I missed the ones_like, zeros_like! Sorry for not reading carefully.
There are many things that could be the reason :

Loss function
Try using 1 - D(G(z)) instead
Time
Wait for a few iterations until training converges to a specific behavior, for example generator always wins.
Learning rates
Try adjusting such that the loosing part has higher learning rate
Discriminator number of iterations vs Generator number of iterations
Try adjusting such that the loosing part has more iterations
Weight initialization
Try Xavier Uniform with gain set according to non-linearity
Noise vector
Try using uniform noise instead of normal noise
Model capacity
Try increasing the loosing part's capacity

Report here if you find what the problem was such that we all learn.

Answer 9 · 2018-01-17T08:49:25.000Z

I can also use: Label Smoothing for the discriminator to weaken it.

I have submit a PR request on TF to help you implementing this feature: tensorflow/tensorflow#16153

You can get inspiration to write your own custom code

Answer 10 · 2019-09-12T06:56:09.000Z

I encounter the same issue, and finally found that I ignore to use ONLY the gradient of each discriminator part or generator part. If you don't add params like var_list=generator_vars, the optimizer will weaken discriminator's ability by update it's parameters.

discriminator_vars =  [var for var in tf.global_variables() if  "discriminator" in var.name]
generator_vars =  [var for var in tf.global_variables() if  "generator" in var.name]

self.D_optimizer = tf.train.AdamOptimizer(learning_rate=2e-4).minimize(self.D_loss, var_list=discriminator_vars)
self.G_optimizer = tf.train.AdamOptimizer(learning_rate=2e-4).minimize(self.G_loss, var_list=generator_vars)