About the calculation of GP loss

Question

About the calculation of GP loss

wmyw96 opened this issue 6 years ago · 1 comments

Hi ~, I have run your code on my computer with default commands mentioned in README.md,
when I track the calculation of GP loss, I found a little bit confusing,

def wasserstein_penalty(discriminator, A_true, A_fake, params,
                        discriminator_params):
  A_interp = sample_along_line(A_true, A_fake, params)
  if params.use_embeddings:
    A_interp = softmax_to_embedding(A_interp, params)
  discrim_A_interp = discriminator(A_interp, discriminator_params, params)
  discrim_A_grads = tf.gradients(discrim_A_interp, [A_interp])

  if params.original_l2:
    l2_loss = tf.sqrt(
        tf.reduce_sum(
            tf.convert_to_tensor(discrim_A_grads)**2, axis=[1, 2]))
    if params.true_lipschitz:
      loss = params.wasserstein_loss * tf.reduce_mean(
          tf.nn.relu(l2_loss - 1)**2)
    else:
      loss = params.wasserstein_loss * tf.reduce_mean((l2_loss - 1)**2)
  else:
    loss = params.wasserstein_loss * (tf.nn.l2_loss(discrim_A_grads) - 1)**2
  return loss

When the A_interp has the shape [64, 100, 256], which can be annotated with [batch_size, seq_len, input_dim], and discrim_A_interp has shape [64, 2, 1], then tf.convert_to_tensor(discrim_A_grads) has shape [1, 64, 100, 256], but you apply reduce_sum on it along axis [1,2] instead of axis [2,3]?

Answer 1 · 2018-08-13T21:54:30.000Z

Thanks for pointing that out! I've pushed a fix for the bug. I don't expect it to throw off results, however let us know if there needs to be any retuning of hyperparams to compensate for the change in scale.