pytorch-examples/tensor/two_layer_net_numpy.py backprop

Question

mosicr opened this issue 6 years ago · 1 comments

Shouldn't w2.t() below be grad_w2 instead ? Thanks.

grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())

Answer 1 · 2018-10-24T20:06:21.000Z

No, the current implementation is correct. See my derivation for backprop through linear layers here: