This repository contains code, data and links to pre-trained models for the USENIX Security '21 paper titled "T-Miner: A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification". Full paper available at: https://arxiv.org/pdf/2103.04264.pdf.
Create a virtualenv or anaconda environment using the requirements.txt provided. Then, copy the two provided directories for the texar library and replace the ones installed. This is necessary, as we have made multiple modifications to the library itself. You can find your installed texar paths in your virtual environment's site-packages directory. Those are the ones you will want to replace.
Below we provide the pipeline for training a clean classifier, testing it, training the T-Miner defense, running the perturbation generator, and finally running the Trojan identifier:
sh benign_classifier_train.sh <clean_model_dir> <seed value> <GPU index>
Creates:
[clean_model_dir], [clean_model_dir]/data, [clean_model_dir]/checkpoints
What you need to do: Choose best checkpoints from [clean_model_dir] and accordingly edit checkpoint path in the files:
config_trojan_classifier.py, config_defender.py, config.evaluation.py
sh benign_classifier_test.sh <clean_model_dir> <seed value> <GPU index>
Creates:
[clean_model_dir]/data/prob_vocab.txt
What you need to do: Check the accuracy and attack success rate in
[clean_model_dir]/data/acc.txt and [clean_model_dir]/data/asr.txt
sh benign_defender.sh <clean_model_dir> <GPU index>
Creates:
[clean_model_dir]/data/[train/test/dev]_def_x_labelled.txt
sh benign_evaluation.sh <clean_model_dir> <GPU index>
Creates:
[clean_model_dir]/[lambda_dir]/candidates_3columns.txt
sh benign_clustering.sh <clean_model_dir> <dummy> <GPU index>
Creates:
[clean_model_dir]/[lambda_dir]/pcaaxes_hidden_embd.png, [clean_model_dir]/[lambda_dir]/outliers_hidden_embd_result.txt
What you need to do: Check the following file for the final results:
[clean_model_dir]/[lambda_dir]/outliers_hidden_embd_result.txt file
Below we provide the pipeline for training a Trojan classifier, testing it, training the T-Miner defense, running the perturbation generator, and finally running the Trojan identifier:
sh trojan_classifier_train.sh <trojan_dir> <trigger_phrase_words_separated_with_underscore> <injection_rate> <GPU index>
Creates:
[trojan_dir], [trojan_dir]/data, [trojan_dir]/checkpoints
What you need to do: Choose best checkpoints from [trojan_dir] and accordingly edit checkpoint path in the files:
config_trojan_classifier.py, config_defender.py, config.evaluation.py
sh trojan_classifier_test.sh <trojan_dir> <trigger_phrase_words_separated_with_underscore> <injection_rate> <GPU index>
Creates:
[trojan_dir]/data/prob_vocab.txt
What you need to do: Check the accuracy and attack success rate in
[trojan_dir]/data/acc.txt and [trojan_dir]/data/asr.txt
sh trojan_defender_train.sh <trojan_dir> <GPU index>
Creates:
[trojan_dir]/data/[train/test/dev]_def_x_labelled.txt
sh trojan_evaluation.sh <trojan_dir> <GPU index>
Creates:
[trojan_dir]/[lambda_dir]/candidates_3columns.txt
sh trojan_clustering.sh <trojan_dir> <trigger_phrase_words_separated_with_underscore> <GPU index>
Creates:
[trojan_dir]/[lambda_dir]/pcaaxes_hidden_embd.png, [trojan_dir]/[lambda_dir]/outliers_hidden_embd_result.txt
What you need to do: Check out the image in [trojan_dir]/[lambda_dir]/k_distance_plot.png. Find the x-axis value of the point where the steepness starts. Take the floor of the x-axis value. That's your epsilon. Then run:
python script_internal_layer.py <trojan_dir> <trigger_phrase_words_separated_with_underscore> 1.0 0.5 0.03 get-outliers 5 <epsilon>
. Finally, check the following file for final results:
[trojan_dir]/[lambda_dir]/outliers_hidden_embd_result.txt
You can find pre-trained Tensorflow graph checkpoints for the entire training pipeline at https://drive.google.com/drive/u/1/folders/1txUWF_nhsKZDGWUgphmLEqXIC8EKklTy.
@inproceedings{azizi21tminer,
title={T-Miner: A Generative Approach to Defend Against Trojan Attacks on DNN-based Text Classification},
author={Azizi, Ahmadreza and Tahmid, Ibrahim and Waheed, Asim and Mangaokar, Neal amd Pu, Jiameng and Javed, Mobin and Reddy, Chandan K. and Viswanath, Bimal},
booktitle={Proc. of USENIX Security},
year={2021}
}