minimize depends on batch size?
Closed this issue · 1 comments
fancunwei95 commented
Hello,
This is Cunwei and I am interested in ferminet and its applications. When we try to read and run the code, we have a minor problem about the minimize procedure in the code. In qmc.py #264 the code looks like
optimize_step = functools.partial(
optimizer.minimize,
features,
global_step=global_step,
var_list=self.network.trainable_variables,
grad_loss=grad_loss_clipped)
Thus, my question is that whether this gradient computation depends on the batch size for Adam optimizer. I checked the tensorflow source code (not sure the version is what required here) and find the adam optimizer tries to dot product grad_loss and the features gradient. Thus, it seems that the gradients is extensive. But if it is not, is this scale dealt with somewhere else?
Thank you a lot for answering this question.