Generating Textual Adversaries with Minimal Perturbation

This repository contains codes and resources associated to the paper:

Xingyi zhao, Lu Zhang, Depeng Xu and Shuhan Yuan. 2022. Generating Textual Adversaries with Minimal Perturbation. In findings of the Association for Computational Linguistics: EMNLP 2022. [paper]

Dependencies

Python 3.8
PyTorch 1.11.0
transformers 4.20.1
cuda version 11.5

Usage

To craft adversarial examples based on TAMPERS, run(attack "textattack/bert-base-uncased-rotten-tomatoes" for example):

python tampers.py --data_path data/MR.csv --victim_model "textattack/bert-base-uncased-rotten-tomatoes" --num 1000 --output_dir attack_result/

--data_path: We take MR dataset for example. To reproduce our experiments, datasets can be find TAMPERS. For more datasets, you can check TextFooler. Our code is based on the binary classification task.
--victim_model: You can find the fine tuned models from huggingface-textattack. In our experiments, we use four fine tuned models corresponding to their datasets. IMDB, MR, YELP and SST2.
--num: Number of texts you want to attack.
--output_dir: Output file. You need to create an empty file at first.

Baselines

To run the baselines, you can refer to TextAttack.

Two issues should be claimed here:

Running bert attack will take long time in this package. See the issue here. Therefore, we just follow the setting of TextDefender and ignore word to replace tokenized as multiple sub-words.
Using USE to compute the semantic similarity, we correct the code. In the TextFooler and bert-attack code, they forget to divide the angle between the two embedding by pi. The correct computation should be: 1 - arccos(cosine_similarity(u, v)) / pi. See here.

Demo of results:

We give a result for example(results are saved in attack_result), results below are based on MR dataset. In our paper, we sampled five different 1000 samples and take an average value as the final results.

xingyizhao/TAMPERS

Generating Textual Adversaries with Minimal Perturbation

Dependencies

Usage

Baselines

Demo of results:

Citation: