Chapter 13. P.244: Why the backprop is different for "mul"?
davidjones1 opened this issue · 1 comments
davidjones1 commented
Why there was needed to define "new" and call it once after defining new.
I think the Tensor data in this case is multiplied to the "other". But why it is different than "add"?
naruto678 commented
Let z=xy. Let the loss be L . So the gradient coming here is dL/dz. Backprop computes the dL/dx which can be given by (dl/dz)*(dz/dx)=grad * y and dL/dy=(dL/dz) * (dL/dx)=grad * x .
if z=x+y , then dL/dx=(dL/dz)*(dz/dx)=grad and similarly for y. That is why the during backprop different grads are provided depending on which operation was used to create the tensor.
Consider (d/dx) as the partial differential operator. I did not see a button for symbol insertion so this has to suffice. sorry
Hope this helps