in the warm up example, can you elaborate on the backprop ?
LohithBlaze opened this issue · 1 comments
LohithBlaze commented
In the backprop for the warm up example, to obtain the grad_w2, why is h_relu.T is required. when the derivative of dY_pred/dW2 = h_relu and DLoss/DY_pred = 2(y_pred - y), from chain rule, we obtain Dloss/Dw2 as h_relu * 2(y_pred - y) not h_relu.T * 2 (y_pred - y).
Can you explain why it is h_relu.T ???
jcjohnson commented
I have written up a derivation here: http://cs231n.stanford.edu/handouts/linear-backprop.pdf