adaptive_lm: A Python repository from WebSAIL-NU

Deprecated project, moved to seqmodel

This repository is an implementation an RNNLM using Tensorflow (r1.0).

Usage

Preparing data

We have a script to download and preprocess public LM dataset. Please see shell script files in data. For other corpus, you need to prepare train.txt, valid.txt, and test.txt and run the main preprocessing file in preprocessing module.

Training

You can train a langauge model with default option with:

python run_lm.py --training --save_config_file train_config.json

It will create a directory experiments and save all checkpoints and logs in the directory. By default, the script will use LSTM cell and train on PTB dataset. For other option, please add --help option.

Testing

The same file can also be used for testing. To reuse the configuration file by passing --load_config_filepaht and override the configuration by provding new ones. For example

python run_lm.py --load_config_filepaht experiments/out/train_config.json --no-training

Extending

There are many levels of modification in the code.

feed_dict and fetch: implement new functions to modify default feed_dict and fetch dictionary from default method, see collecting token loss for example. Note that feed_dict is mapped from grap node dictionary and data iterator's batch of the same keys (see map_feeddict(batch, model_feed)).
Initialization and/or minor architecture changes: implement a new BasicRNNHelper.
New architecture: implement a new RNNLM class (See BasicRNNLM for example).

TODO

Support other cell types: add commandline argurment and improve feed_state(.)
Provide decoder interface
Change to Tensorflow r1.1 (contrib.rnn is no longer supported)