jcjohnson/pytorch-examples

in the warm up example, can you elaborate on the backprop ?

LohithBlaze opened this issue · 1 comments

In the backprop for the warm up example, to obtain the grad_w2, why is h_relu.T is required. when the derivative of dY_pred/dW2 = h_relu and DLoss/DY_pred = 2(y_pred - y), from chain rule, we obtain Dloss/Dw2 as h_relu * 2(y_pred - y) not h_relu.T * 2 (y_pred - y).

Can you explain why it is h_relu.T ???