Justin-Tan/generative-compression

Question about gradient calculation in the quantizer

Opened this issue · 5 comments

  1. I found that you have used "tf.stop_gradient()" to deal with the nondifferentiable properties of "tf.argmin()", "tf.round()". However, "tf.stop_gradient()" is used to ignore the gradient contirbution of present node, which means that your encoder network will not update its parameters since all the nodes before the quantizer (specifically the encoder network) will be ignored in the gradient calculation.

  2. Are you tring to make the quantizer have a fixed gradient (such as "1") value at any time? If you are, I think you have to re-define the gradient of the quantizer rather than use "tf.stop_gradient()" .

I will look into the the first one, can you give more details about the second point?

I wonder how to deal with the gradient when I apply the operations such as "tf.round". Will setting the gradient of these operation to be 1 help? Could you offer some references? Thank you very much!

Really thanks for your answer. I've got another problem. I wonder how does batchsize affect the result. Have you ever tried a bigger batch size? Thanks for your reply again!

The batch size is limited by GPU memory. In recent papers about learned image compression the batch size is usually set to something low like 8 or 16, so I imagine that other hyperparameters would be more important to tune.