LSTM encoder-decoder seq2seq model

This project is an implementation of a simple encoder-decoder seq2seq model in MxNet. The OpenNMT model is based on:

a stacked LSTM encoder
a stacked LSTM decoder

The reference model configuration is:

2 LSTM encoder layers
2 LSTM decoder layers
hidden state of 500 units
embedding size of 500 units
truncating the vocabulary to the top 50,000 words
truncating the maximum sentence length to 50 words

The reference model is normally trained on the WMT15 English-German dataset, which consists of:

raw Europarl v7
Common Crawl
News Commentary v7

For more details, see here.

How to run the scripts?

Run the get_opennmt_data.sh script to download and preprocess the dataset. Preprocessing includes corpora concatenation, shuffling, and a trainining/validation set split.
Run the preprocessing step. Fair warning: this can take close to half an hour, but it's a one-time cost. Run python preprocess_data.py to do that. Make sure you have enough RAM, at least 32 GB.
Run the model. Note that the batch size setting is global, not per GPU. So, when running on a DGX-1 with a batch size of 128 per GPU, choose a batch size of 128 * 8 = 1,024. Here is a sample command to run the training:

python seq2seq_bucketing.py --num-layers 2 --num-embed 500 --num-hidden 500 --optimizer adam --disp-batches 1 --gpus 0,1 --num-epochs 1 --batch-size 256

Credits

Many thanks to Eric Xie (@piiswrong), Sheng Zha (@szha) and Antti-Pekka Hynninen (@ap-hynninen) for valuable input.

License

This project is licensed under the Apache 2.0 license. See the text of the license here.

Note:

Inference isn't implemented yet.

xuerenlv/mxnet_seq2seq

LSTM encoder-decoder seq2seq model

How to run the scripts?

Credits

License