This example trains a multi-layer RNN (Elman, GRU, or LSTM) on a language modeling task. By default, the training script uses the PTB dataset, provided. The trained model can then be used by the generate script to generate new text.
python main.py --cuda # Train an LSTM on ptb with cuda (cuDNN). Should reach perplexity of 113
python generate.py # Generate samples from the trained LSTM model.
The model uses the nn.RNN
module (and its sister modules nn.GRU
and nn.LSTM
)
which will automatically use the cuDNN backend if run on CUDA with cuDNN installed.
The main.py
script accepts the following arguments:
optional arguments:
-h, --help show this help message and exit
--data DATA location of the data corpus
--model MODEL type of recurrent net (RNN_TANH, RNN_RELU, LSTM, GRU)
--emsize EMSIZE size of word embeddings
--nhid NHID humber of hidden units per layer
--nlayers NLAYERS number of layers
--lr LR initial learning rate
--clip CLIP gradient clipping
--epochs EPOCHS upper epoch limit
--batch-size N batch size
--bptt BPTT sequence length
--seed SEED random seed
--cuda use CUDA
--log-interval N report interval
--save SAVE path to save the final model