Augment BERT datasets, including CoLA, RTE, and STS-B, using synonym replace, random swapping, random deletion, and delete by pos.
Install Spacy
pip install spacy
python -m spacy download en_core_web_sm
Install wordnet
pip install -U nltk
for the first time you use wordnet
import nltk
nltk.download('wordnet')
-
synonyms are queried from wordnet.synsets(word, pos=pos_map[pos]).
-
dataset location in augment_cola.py is hard coded. needs to edit input_file and out_file.
token_utils.py is from https://github.com/commonsense/metanl/blob/master/metanl/token_utils.py