L2 regularization for the gradient
YukeWang96 opened this issue · 1 comments
YukeWang96 commented
hello,
I just wonder whether the L2 regularization for gradient should also be considered in the third question of NetworkVisualization-TensorFlow
. Otherwise, the variable Xi
remains unused. The computed gradient should be like dx[0] - 2 * l2_reg * Xi
instead of dx[0]
MahanFathi commented
Hey,
That's correct. The regularization should have a share of the gradient, and that's what's actually happening here. TensorFlow calculates this automatically, hence the name of 'auto diff.' Correct me if I'm looking at the wrong piece of code:
loss = model.classifier[0, target_y] - l2_reg * tf.nn.l2_loss(model.image) # scalar loss
grad = tf.gradients(loss, model.image) # gradient of loss with respect to model.image, same
dx = sess.run(grad, feed_dict={model.image: X})
X += dx[0] * learning_rate
Best,
Mahan