minimize depends on batch size?

Question

minimize depends on batch size?

Closed this issue 3 years ago · 1 comments

Hello,

This is Cunwei and I am interested in ferminet and its applications. When we try to read and run the code, we have a minor problem about the minimize procedure in the code. In qmc.py #264 the code looks like

optimize_step = functools.partial(
            optimizer.minimize,
            features,
            global_step=global_step,
            var_list=self.network.trainable_variables,
            grad_loss=grad_loss_clipped)

Thus, my question is that whether this gradient computation depends on the batch size for Adam optimizer. I checked the tensorflow source code (not sure the version is what required here) and find the adam optimizer tries to dot product grad_loss and the features gradient. Thus, it seems that the gradients is extensive. But if it is not, is this scale dealt with somewhere else?

Thank you a lot for answering this question.

Answer 1 · 2021-01-14T17:58:07.000Z

Duplicate of #11