Chapter 13. P.244: Why the backprop is different for "mul"?

Question

Chapter 13. P.244: Why the backprop is different for "mul"?

davidjones1 opened this issue 6 years ago · 1 comments

Why there was needed to define "new" and call it once after defining new.
I think the Tensor data in this case is multiplied to the "other". But why it is different than "add"?

Answer 1 · 2019-06-14T18:37:37.000Z

Let z=xy. Let the loss be L . So the gradient coming here is dL/dz. Backprop computes the dL/dx which can be given by (dl/dz)*(dz/dx)=grad * y and dL/dy=(dL/dz) * (dL/dx)=grad * x .

if z=x+y , then dL/dx=(dL/dz)*(dz/dx)=grad and similarly for y. That is why the during backprop different grads are provided depending on which operation was used to create the tensor.
Consider (d/dx) as the partial differential operator. I did not see a button for symbol insertion so this has to suffice. sorry
Hope this helps