SCLBD/BackdoorBench

Implementation issue with ANP

Closed this issue · 3 comments

Hi BackdoorBench Team,

Firstly, this tool is amazing. Thank you very much for putting in the effort to design and make this.

I have found an issue with the implementation of the ANP method. In both the evaluate_by_threshold and evaluate_by_number methods, the test ASR value and ACC accuracy difference are used to determine what threshold value is used. However, the test ASR value would not be accessible to the defender when performing pruning as the paper does not assume that backdoor data is available. Could you please clarify if this is an issue or if I am missing something.

Thanks :)

Hello WhoDunnett.
Thank you very much for your question. We have tested more models in the benchmark, and the threshold/number of these additional models is not given explicit criterion selection in the original ANP. To prevent the model acc from dropping too much, we used the test that acc should not drop more than a certain percentage, rather than the difference between acc and asr, which is also used in the fine-pruning method in Section 2.2 of Fine-Pruning: Defending Against Backdooring Attacks
on Deep Neural Networks
. At the same time, we have provided a parameter to determine pruning amplitude directly, simply by entering the pruning_number parameter at runtime.

Hi mdzhangst,

Thank you for getting back to me. I agree that the use of a percentage drop is needed given that the original ANP paper is vague about the stopping criteria. However, I think this stopping criteria should only consider ACC as the ASR after each round would not be accessible to the defender. Given that ANP assumes access to clean data only, a defender can only measure the ACC around each round and therefore it alone should inform stopping rather than ACC and ASR. Note, that this is how the current FP implementation is currently designed. While the current implementation of ANP is unlikely to produce significantly different results, it might be possible for ASR to increase after several rounds (this is shown in some of the figures in the ANP paper). As a result, the current criteria would bias the selected model to be the one with the lowest ASR that also satisfies the ACC accuracy drop criteria, which is problematic given that ANP assumes access to clean data only.

Hopefully, this makes sense. Please let me know if I am missing something.

Hi WhoDunnett,
Thank you for getting back to me. I think ASR should not be obtained in the previous ANP method. We will also modify this criterion in later versions. You can now use python ./defense/anp.py --pruning_number xx to set the threshold of ANP.