/PWWS

Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency

Primary LanguagePythonMIT LicenseMIT

Probability Weighted Word Saliency(PWWS)

This repository contains Keras implementations of the ACL2019 paper Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency.

Overview

  • data_set/aclImdb/ , data_set/ag_news_csv/anddata_set/yahoo_10 are placeholder directories for the IMDB Review, AG's News and Yahoo! Answer, respectively.
  • word_level_process.pyandchar_level_process.py contain two different prepressing methods of dataset for word-level and char-level, respectively.
  • neural_networks.py contain implementations of four neural networks(word-based CNN, Bi-directional LSTM, char-based CNN, LSTM) used in paper.
  • Use training.pyto train four NN in neural_networks.py.
  • fool.py, evaluate_word_saliency.py, get_NE_list.py,adversarial_tools.pyandparaphrase.pybuild the experiment pipeline.
  • Use evaluate_fool_results.py to evaluate classification accuracy and word replacement rate of adversarial examples generated by PWWS.

Dependencies

  • Python 3.7.1.
  • Versions of all depending libraries are specified in requirements.txt. To reproduce the reported results, please make sure that the specified versions are installed.
  • If you did not download WordNet(a lexical database for the English language), use nltk.download('wordnet') to do so.(Cancel the code comment on line 14 in paraphrase. py)

Usage

  • Download dataset files from google drive , which include
    • IMDB: aclImdb.zip. Decompression and place the folderaclImdb indata_set/.
    • AG's News: ag_news_csv.zip. Decompression and place the folder ag_news_csv indata_set/.
    • Yahoo Answers: yahoo_10.zip. Decompression and place the folder yahoo_10 indata_set/.
  • Download glove.6B.100d.txtfrom google drive and place the file in /.
  • Run training.py or use command likepython3 training.py --model word_cnn --dataset imdb --level word. You can reset the model hyper-parameters in neural_networks.py and config.py.Note that neither this repository nor the paper provides an implementation of char_cnn on IMDB and Yahoo! Answers datasets.
  • Run fool.py or use command likepython3 fool.py --model word_cnn --dataset imdb --level wordto generate adversarial examples using PWWS.
  • Runevaluate_fool_reaults.pyto evaluate adversarial examples.
  • If you want to train or fool different models, reset the argument in training.pyandfool.py.

Result on pretrained model

runs/contains some pretrained NN models, the information of these models are showed as the following table.

We use these pretrained models to generate 1000 adversarial examples with PWWS.

  • test_set means classification accuracy on test set.
  • clean_1000 means classification accuracy on the 1000 clean samples(from test set).
  • adv_1000 means classification accuracy on the adversarial examples corresponding to the 1000 clean samples.
  • sub_rate means word replacement rate defined in Section 4.4.
  • NE_rate means (number of $NE_{adv}$)/(number of substitute word).

If you want to use this model, rename the them or modify the paths to model in the .py files.

data_set neural_network test_set clean_1000 adv_1000 sub_rate NE_rate
IMDB word_cnn 88.792% 86.2% 5.7% 3.933% 21.395%
word_bdlstm 87.472% 86.8% 2.0% 4.206% 11.094%
word_lstm 88.420% 89.8% 10.4% 6.816% 6.548%
AG's News word_cnn 90.526% 89.0% 13.2% 12.308% 30.877%
word_bdlstm 90.711% 89.3% 12.9% 13.494% 27.227%
word_lstm 91.829% 91.4% 18.1% 18.102% 27.374%
char_cnn 88.224% 88.5% 20.0% 11.979% 23.241%
Yahoo! Answers word_cnn 88.427% 96.1% 8.7% 33.067% 12.768%
word_bdlstm 88.876% 94.4% 9.4% 20.752% 7.016%

Contact

  • If you have any questions regarding the code, please create an issue or contact the owner of this repository.

Acknowledgments