myTextFooler

A Model for Natural Language Attack on Text Classification and Inference

使用时序评分对单词重要性排序,对关键词做同意替换,选取替换前后相似度较大且影响较大样本作为对抗样本。

环境

本仓库仿照TextFooler,对平台进行了适配。相关环境安装请参考原仓库,或者参考build.sh

运行

在环境安装好后,请在myTextFooler目录下解压counter-fitted-vectors.txt.zip

先构建TextFooler类的实例:

textfooler = TextFooler(model=predictor, device='gpu0', IsTargeted=False)

然后将输入数据与标签输入,来生成对抗样本:

adv_xs = textfooler.generate(texts, labels)

具体运行样例参考text.py(注:需要原仓库中的一些代码依赖,具体看import。text.py仅为测试,具体使用不需要原仓库依赖)

本仓库代码仅在与原仓库相同配置下测试过。

参数说明

perturb_ratio: Whether use random perturbation for ablation study. 默认为0

sim_score_threshold: Required minimum semantic similarity score. 默认为0.7

import_score_threshold: Required mininum importance score. 默认为-1

sim_score_window: Text length or token number to compute the semantic similarity score  默认为15

synonym_num: Number of synonyms to extract  默认为50

batch_size: Batch size to get prediction  默认为32

counter_fitting_embeddings_path: path to the counter-fitting embeddings we used to find synonyms  默认为myTextFooler/counter-fitted-vectors.txt

counter_fitting_cos_sim_path: pre-compute the cosine similarity scores based on the counter-fitting embeddings  默认为空

USE_model_path: Path to the USE model.  默认为空