Training / Testing Neural Language Models (NLMs) using Python/Theano Thang Luong @ 2014, 2015 <lmthang@stanford.edu> This codebase allows for training feed-forward NLMs, both monolingual (normal) models and bilingual (joint) models that condition on the source text as well. The joint NLM is in the context of machine translation (MT) and replicates the model proposed in the BBN's paper http://acl2014.org/acl2014/P14-1/pdf/P14-1129.pdf with several differences. For more details about this code, please refer to our paper: Deep Neural Language Models for Machine Translation Minh-Thang Luong Michael Kayser Christopher D. Manning http://www.aclweb.org/anthology/K/K15/K15-1031.pdf Feature highlights: (a) train both monolingual (normal) and bilingual (joint) NLM models. (b) have self-normalization feature. (c) include all the preprocessing steps (build vocab, convert text form into integer format, and extract ngrams to train). (d) resume training from a saved model. (e) test trained NLMs to produce sentence probabilities. (f) have dropout (we haven't tested this feature thoroughly and weren't able to achieve gains). Files & Directories: (a) README.txt - this file (b) code/ - directory contains all the code files, e.g. train_nlm.py and test_nlm.py. (c) data/: contains files (train|tune|test).(en|zh|align) where -.align contains alignments for a pair of sentences per line. Each line is a series of pairs Chinese positions - English positions. Main code: (a) Train normal NLMs: train_nlm.py [options] train_data tune_data test_data ngram_size vocab_size out_prefix ./code/train_nlm.py --act_func tanh --learning_rate 0.1 --emb_dim 16 --hidden_layers 64 --log_freq 10 ./data/train.en ./data/tune.en ./data/test.en 11 1000 ./output/toy After running the above command, you should get back: the model with the best valid perplexity (./output/toy.model), the most recent model (./output/toy.model.cur), and an vocab file (./output/toy.vocab). Each model goes with a config file with detailed training information. To train more than one hidden layers, change 64 into 64-64 (2 layers) or 32-64-128 (3 layers), etc. To use GPUs, append the following text to the beginning of a running comand: THEANO_FLAGS='device=gpu0' . (b) Train joint NLMs: add the following options --joint --src_lang <str> --tgt_lang <str> THEANO_FLAGS='device=gpu0' python ./code/train_nlm.py --act_func tanh --learning_rate 0.1 --emb_dim 16 --hidden_layers 64 --joint --src_lang zh --tgt_lang en ./data/train ./data/tune ./data/test 5 1000 ./output/toy_joint After running the above command, you should get back model files similar to (a), and two vocab files (./output/toy_joint.vocab.en and ./output/toy_joint.vocab.zh). (c) Train self-norm models: THEANO_FLAGS='device=gpu0' python ./code/train_nlm.py --self_norm_coeff 0.1 --act_func tanh --learning_rate 0.1 --emb_dim 16 --hidden_layers 64 --joint --src_lang zh --tgt_lang en ./data/train ./data/tune ./data/test 5 1000 ./output/toy_joint_self (c) Test NLMs: test_nlm.py [options] model_file vocab_file test_file out_file THEANO_FLAGS='device=gpu0' python ./code/test_nlm.py --self_norm_coeff 0.1 --act_func tanh --joint --src_lang zh --tgt_lang en --src_file ./data/test.zh --align_file ./data/test.align ./output/toy_joint_self.model ./output/toy_joint_self.vocab ./data/test.en ./output/toy_scores.txt Note that: the test_nlm.py code will try to output a perplexity score as well. For self-norm model, to get a correct perplexity, remove the option --self_norm_coeff 0.1
aaronlifenghan/nlm
Train deep neural language models as described in this paper "Deep Neural Language Models for Machine Translation" http://www.aclweb.org/anthology/K/K15/K15-1031.pdf . We don't claim that this is a fast implementation but was sufficient for us to obtain consistent gains in real translation tasks.
Python