This is a repository for testing language model robustness to training data noise.
Run python gen_datasets.py
to generate the datasets used in our experiments.
Also included are scripts for training and testing basic models. Then the following script may be run:
./scripts/train.sh