This repository contains the official implementation code of the paper DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification, accepted at COLING 2022.
-
Check the datasets. Training sets of SNLI and MultiNLI can be found in this link. Place them under the folder
dataset/snli
anddataset/multinli
. We implemented the augmentation methods in DoubleMix using files undersrc/augment
folder. -
Set up the environment
conda create -n doublemix python==3.8
conda activate doublemix
cd DoubleMix/
pip3 install -r requirements.txt
- Run DoubleMix
cd src/
CUDA_VISIBLE_DEVICES=0 python3 train.py --dataset [dataset] --aug 1
Please cite our paper if you find our work useful for your research:
@inproceedings{chen2022doublemix,
title={DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification},
author={Chen, Hui and Han, Wei and Yang, Diyi and Poria, Soujanya},
booktitle={Proceedings of the 29th International Conference on Computational Linguistics},
pages={4622--4632},
year={2022}
}
Should you have any questions, feel free to contact chchenhui1996@gmail.com.