The usage of my file:
The important file :raw_glue.py raw_run_glue.py run_glue.py is used to fine-tune bert and I download from the huggingface
almost all of the *.sh is used to run the above file
active*.py is used to test the pipline as teacher
align_prediction.py is used to filter once or twice more in our method
getseeds.py : get the keyword from the dataset
prepare_two_seeds.py: use human knowledge as teacher
The Dictionary of ONION is used to test onion: you can see a readme in it
you can see my paper to understand much more
the output is on the remote server and I don't move it down because it's big (it contains the trained model )
chmod u+x *.sh
active bash
bash raw.sh
to test the poison rate on the poison data
train on poison and test on poison
asr
python test_pipeline_on_clean.py
get pipeline acc
test on clean && raw model
python test_pipeline_on_poisoned.py
get pipeline ASR
bash raw_on_clean.sh
to trai on poison and test on clean
acc
bash raw_clean.sh
train on clean test on clean
bash raw_clean_poison.sh
train on clean test on poison
python active_pipline.py
./train_pipeline.sh 0 sst2 word 1 0.05 pipeline
python align_predictions.py --dataset sst2 --type word --target 1 --rate 0.05 --defense pipeline
./train_sanitized.sh 0 sst2 word 1 0.05 pipeline
python getseeds.py
result:
[(' ', 69793), ('rrb', 154), ('good', 143), ('funny', 116), %%('love', 108), %('best', 105), ('right', 99), ('comedy', 99), ('young', 98), ('lrb', 98), ('little', 95), ('makes', 95), ('come', 95), ('make', 94), ('characters', 92), ('life', 88), ('high', 87), ('way', 85), ('new', 80), ('work', 76), ('drama', 74), ('time', 73), ('performances', 72), ('movies', 71), ('look', 67), ('cast', 65), ('old', 63), ('great', 61), ('real', 59), ('big', 59), ('films', 58), ('performance', 56), ('fun', 55), ('entertaining', 55), ('world', 55), ('sense', 54), ('tale', 54), ('character', 54), ('man', 53), ('people', 53), ('really', 52), ('family', 50), ('human', 49), ('feel', 49), ('fascinating', 47), ('heart', 46), ('better', 46), ('year', 45), ('end', 44), ('self', 44)]
[(' ', 51304), ('rrb', 116), %%('bad', 104), ('lrb', 88), ('time', 78), ('characters', 78), ('good', 77), ('little', 76), ('comedy', 73), ('plot', 67), ('make', 60), ('really', 59), ('way', 57), ('long', 51), ('script', 51), ('hard', 50), ('better', 48), ('makes', 47), ('minutes', 46), ('thing', 46), ('feel', 45), ('self', 45), ('movies', 44), ('kind', 44), ('new', 43), ('no', 42), ('ve', 40), ('old', 40), ('work', 39), ('funny', 39), ('audience', 38), ('people', 37), ('comes', 36), ('life', 35), ('drama', 34), ('ca', 34), %%('worst', 33), ('things', 33), ('watching', 32), ('character', 32), ('acting', 32), ('hollywood', 32), ('big', 32), ('dialogue', 32), ('real', 31), ('ultimately', 31), ('sense', 31), ('quite', 30), ('ll', 30), ('far', 30)]
python prepare_two_seeds.py --dataset sst2 --type word --target 1 --rate 0.05
./train_two_seeds.sh 0 sst2 word 1 0.05
python align_predictions.py --dataset sst2 --type word --target 1 --rate 0.05 --defense two_seeds
./train_sanitized.sh 0 sst2 word 1 0.05 two_seeds
python prepare_two_seedsraw.py --dataset sst2 --type word --target 1 --rate 0.05
./train_brain_raw.sh 0 sst2 word 1 0.05
python align_predictions.py --dataset sst2 --type word --target 1 --rate 0.05 --defense two_seeds_brain
./train_sanitized.sh 0 sst2 word 1 0.05 two_seeds_brain
python prepare_two_seedsraw.py --dataset sst2 --type sentence --target 1 --rate 0.05
./train_brain_raw.sh 0 sst2 sentence 1 0.05
python align_predictions.py --dataset sst2 --type sentence --target 1 --rate 0.05 --defense two_seeds_brain
./train_sanitized.sh 0 sst2 sentence 1 0.05 two_seeds_brain
python prepare_two_seeds.py --dataset sst2 --type sentence --target 1 --rate 0.05
./train_two_seeds.sh 0 sst2 sentence 1 0.05
python align_predictions.py --dataset sst2 --type sentence --target 1 --rate 0.05 --defense two_seeds
./train_sanitized.sh 0 sst2 sentence 1 0.05 two_seeds
./train_pipeline.sh 0 sst2 sentence 1 0.05 pipeline
python align_predictions.py --dataset sst2 --type sentence --target 1 --rate 0.05 --defense pipeline
./train_sanitized.sh 0 sst2 sentence 1 0.05 pipeline