/nlm-noising

Primary LanguagePythonOtherNOASSERTION

Data Noising as Smoothing in Neural Network Language Models

Dependencies

Overview

Based off of Tensorflow inplementation here, which is in turn based off of PTB LSTM implementation here.

Implements noising for neural language modeling as described in this paper.

@inproceedings{noising2017,
  title={Data Noising as Smoothing in Neural Network Language Models},
  author={Xie, Ziang and Wang, Sida I. and Li, Jiwei and L{\'e}vy, Daniel and Nie, Aiming and Jurafsky, Dan and Ng, Andrew Y.},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2017}
}

The noising code can be found in loader.py and utils.py.

How to run

First download PTB data from here and put in data directory. Make sure to update paths in cfg.py to point to data. Alternatively, you can also grab the Text8 data here, then run the script data/text8/makedata-text8.sh.

Then run lm.py. Here's an example setting:

python lm.py --run_dir /tmp/lm_1500_kn  --hidden_dim 1500 --drop_prob 0.65 --gamma 0.2 --scheme ngram --ngram_scheme kn --absolute_discounting