ebagdasa/backdoors101

Questions regarding evading Neural Cleanse

openbbox opened this issue · 1 comments

Hi,

Thanks for sharing the code.

I am trying to reproduce the results in the USENIX paper Blind Backdoors in Deep Learning Models that evade the Neural Cleanse defense. I am using the MNIST dataset. I assume if I uncomment the line "- neural_cleanse" in "loss_tasks" in configs/mnist_params.yaml, this should be the same loss function as the one described in Section 6.1 in the paper. Correct me if this is not the case.

So I train a model using the above setting, which is supposed to evade the detection by Neural Cleanse. However, when I use Neural Cleanse to scan this trained model, I get an anomaly index larger than 2, which means the trained model is still considered to be backdoored.

Is there anything not configured properly? Would you be able to take a look? I'd really appreciate it.

Thanks a lot for the feedback I haven't tested it on MNIST, let me try to see if I can make it work.