Code for our paper accepted at CSICC 2021.
@inproceedings{rashid2021effect,
title={The Effect of Using Masked Language Models in Random Textual Data Augmentation},
author={Rashid, Mohammad Amin and Amirkhani, Hossein},
booktitle={2021 26th International Computer Conference, Computer Society of Iran (CSICC)},
pages={1--5},
year={2021},
organization={IEEE}
}
The original EDA code is shared here: https://github.com/jasonwei20/eda_nlp
This code can be run on any text classification dataset.
First, install HuggingFace transformers:
pip install transformer
Second, setup a machine learning library of your choice (TensorFlow or PyTorch)
pip install pytorch
or
pip install tensorflow
All configurations are similar to original EDA (https://github.com/jasonwei20/eda_nlp) except for the type of Masked Language Model you wish to use.
- Note that the mask_model parameter must be one of the listed names. If no model name is provided, it would default to DistilBERT.
Available models: {'bert', 'roberta', 'distilbert'}
python ./augment.py --input='data/tr_2000.tsv' --output='data/tr2000_aug16.tsv' --mask_model='roberta' --num_aug=16 --alpha_sr=0.1 --alpha_rd=0.1 --alpha_ri=0.4 --alpha_rs=0.1