in the warm up example, can you elaborate on the backprop ?

Question

in the warm up example, can you elaborate on the backprop ?

LohithBlaze opened this issue 6 years ago · 1 comments

In the backprop for the warm up example, to obtain the grad_w2, why is h_relu.T is required. when the derivative of dY_pred/dW2 = h_relu and DLoss/DY_pred = 2(y_pred - y), from chain rule, we obtain Dloss/Dw2 as h_relu * 2(y_pred - y) not h_relu.T * 2 (y_pred - y).

Can you explain why it is h_relu.T ???

Answer 1 · 2018-10-24T20:04:28.000Z

I have written up a derivation here: http://cs231n.stanford.edu/handouts/linear-backprop.pdf