thu-ml/Prior-Guided-RGF

Question about alpha estimation in code

inkawhich opened this issue · 4 comments

Thanks for releasing the code for this. I have a quick question about the 'est_alpha' variable here (https://github.com/thu-ml/Prior-Guided-RGF/blob/master/attack.py#L260), specifically this part:

np.sqrt(np.sum(np.square(prior)) * norm_square)

Why do you include the sum of the element-wise squared prior in this denominator term? From equation 16 in the paper, shouldn't this just be:

np.sqrt(299*299*3 * norm_square)

Is there an advantage to doing it one way or another? Or, am I missing something obvious?

Thanks in advance,
Nate

Hello Nate, thanks for your interest in our work! I think this is a good question since we did not comment on this point in our code. The reason is that in our code, we have

prior = prior / np.maximum(1e-12, np.sqrt(np.mean(np.square(prior))))

Hence the l2 norm of prior is \sqrt{3*299*299} instead of 1 (while in the paper the l2 norm of $v$ is 1). Therefore, we need to include the term np.sqrt(np.sum(np.square(prior))) in the denominator.

Feel free to ask me if you have any other questions.

Ok, that makes sense, thanks for the quick reply. My only other question is how critical is the auto sigma scaling in the code (e.g. lines 255, 323)? I understand why you are doing it, but is it just a trick you use to get a little extra performance, or is it more important than that?

The sigma scaling merely serves to prevent numerical issues in finite differences when the gradient of the targeted model is very small. In fact in our early version of the code, we did not include this sigma scaling. But we found that when attacking several images, nan occurred (and when nan occurs it seems that the attack would end and report success, which is not correct), so we added this part to avoid nan.

Thanks for the help.