MadryLab/cifar10_challenge

PGD steps along the sign of the gradient

SohamTamba opened this issue · 1 comments

More of a question than an issue.

It can be infered from here the PGD steps along the sign of the gradient.

Is there any reason it does not simply step along the gradient?
i.e. x += gradient(x)*step_size instead of x += sign(gradient(x))*step_size

Thanks

dtsip commented

The idea is that you want to move each pixel by at most step_size in each iteration while maximizing the loss. In other words, you want to move along the gradient direction as much as possible without changing any pixel by more than step_size. If you think about it, this corresponds exactly to moving by step_size * sign(grad) (if the gradient of a pixel is positive you add step_size, if it is negative you subtract). In the convex optimization literature this is known as L_infinity-based gradient descent.

In principle, you could also just take steps along the gradient. However, we found that this takes much longer to converge and makes tuning the learning rate harder.