jcjohnson/pytorch-examples

pytorch-examples/tensor/two_layer_net_numpy.py backprop

mosicr opened this issue · 1 comments

Shouldn't w2.t() below be grad_w2 instead ? Thanks.

grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())

No, the current implementation is correct. See my derivation for backprop through linear layers here:

http://cs231n.stanford.edu/handouts/linear-backprop.pdf