backdoor detector for BadNets trained on the YouTube Face dataset
https://drive.google.com/drive/folders/1mf9UHHPq6tg8kZGlFTrFpCJfkg4zhWiB?usp=sharing
Add this google drive folder to your Drive and follow the Notebook snippets, In the google Drive folder you can check results in
- results folder
- repaired-networks folder
There are similar kinda folders (results_we_got and repaired-networks_we_got) in which you can see the results we got..
The project is about detecting the backdoor attacks via input filters, neuron pruning and unlearning. So with the trained DNN model we have to find if there is any input trigger that would produce misleading classifications when trigger is added to input i.e(adversarial images)
To know this first we have to know what doesn't fall into this category,
- It is not image specific modification (not Adversarial attack)
- It isn't adversarial poisoning (where an incorrect label assosiation is done at training time or modifications on a trained model)
Thie Backdoor attack is where unexpected results will happen when a trigger is added to input. So if there is no trigger then this model is perfectly fine.
Bad Net: generated by training the model with the adversarial images and actual images, which gives 99% success rate. One other approach is Troajan Attack (latest one) is far more efficient and requires less data.
The given model is backdoored DNN and it only reveals trigger(collection of pixels and its associated colors) when it's used to predict (stealth)
- Detecting backdoor and label it as separate class.
- Identifying the trigger used
- Lastly we gonna make Backdoor DNN right
- First we find the minimal trigger to misclassify all labels into this target label
- We're gonna do that to all labels and then we use outllier detection to find the real trigger, so the real trigger is very small compared to others.
- Now as we have found which neurons get activated by the trigger, we gonna remove the newrons that are related to the backdoor approach (Patching DNN via Neuron Pruning) OR We can unlearn the neurons by adding reveresed trigger.