@article{feng2021survey,
title={A Survey of Data Augmentation Approaches for NLP},
author={Feng, Steven Y and Gangal, Varun and Wei, Jason and Chandar, Sarath and Vosoughi, Soroush and Mitamura, Teruko and Hovy, Eduard},
journal={Findings of ACL},
year={2021}
}
Note: inquiries should be directed to stevenyfeng@gmail.com or by opening an issue here.
Text Classification
Paper
Datasets
Synonym Replacement (Character-Level Convolutional Networks for Text Classification, NeurIPS '15)
AG’s News, DBPedia, Yelp, Yahoo Answers, Amazon
That’s So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets (EMNLP '15)
twitter
Robust Training under Linguistic Adversity (EACL '17)code
Movie review, customer review, SUBJ, SST
Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations (NAACL '18)code
SST, SUBJ, MRQA, RT, TREC
Variational Pretraining for Semi-supervised Text Classification (ACL '19)code
IMDB, AG News, Yahoo, hatespeech
EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks (EMNLP '19)code
SST, CR, SUBJ, TREC, PC
Nonlinear Mixup: Out-Of-Manifold Data Augmentation for Text Classification (AAAI '20)
TREC, SST, Subj, MR
MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification (ACL '20)code
AG News, DBpedia, Yahoo, IMDb
Unsupervised Data Augmentation for Consistency Training (NeurIPS '20)code
Yelp, IMDb, amazon, DBpedia
Not Enough Data? Deep Learning to the Rescue! (AAAI '20)
ATIS, TREC, WVA
SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness (EMNLP '20)code
IWSLT'14
Data Boost: Text Data Augmentation Through Reinforcement Learning Guided Conditional Generation (EMNLP '20)