MahanFathi/CS231

L2 regularization for the gradient

YukeWang96 opened this issue · 1 comments

hello,

I just wonder whether the L2 regularization for gradient should also be considered in the third question of NetworkVisualization-TensorFlow. Otherwise, the variable Xi remains unused. The computed gradient should be like dx[0] - 2 * l2_reg * Xi instead of dx[0]

Hey,

That's correct. The regularization should have a share of the gradient, and that's what's actually happening here. TensorFlow calculates this automatically, hence the name of 'auto diff.' Correct me if I'm looking at the wrong piece of code:

loss = model.classifier[0, target_y] - l2_reg * tf.nn.l2_loss(model.image) # scalar loss
grad = tf.gradients(loss, model.image) # gradient of loss with respect to model.image, same
dx = sess.run(grad, feed_dict={model.image: X})
X += dx[0] * learning_rate

Best,
Mahan