zero_grad implementation correctness question

Question

zero_grad implementation correctness question

zivanfi opened this issue 2 years ago · 3 comments

Hi,

First of all, thanks for the great video tutorial! You describe in the video at 2:11:48, that you need to reset the grad values to zero between iterations by assigning zero to the grad value of every parameter.

I may be misunderstanding this part, but it seems to me that one would also need to zero the grad values for all nodes and not just for the ones representing the parameters, otherwise the grad values for the internal nodes between the parameters will still keep accumulating between iterations.

I may easily be mistaken, so I'm phrasing this issue as a question rather than as a bug report. Is it enough to zero_grad the parameters and if so, why?

Thanks!

Answer 1 · 2023-03-08T00:04:28.000Z

The arithmetic nodes are recreated on each iteration during the forward pass, so we don't need to worry about their gradients carrying over into future iterations. While the network lives through multiple iterations, the node tree created by invoking the network in a forward pass is only used for a single backward pass.

Answer 2 · 2023-03-08T06:04:12.000Z

Ah, I see, thank you!

Answer 3 · 2023-06-09T22:02:53.000Z

Thanks! I had the same question, but this explains it.