Mutual Information and Diverse Decoding Improve Neural Machine Translation

Implementations of the three models presented in the paper "Mutual Information and Diverse Decoding Improve Neural Machine Translation" by Jiwei Li and Dan Jurafsky.

Requirements:

GPU

matlab >= 2014b

memory >= 8GB

Folders

Standard: MMI reranking for standard sequence-to-sequence models

Standard/training: training p(t|s) and p(s|t)

Standard/decode: generating N-best list from p(t|s)

Standard/get_s_given_t: generating the score of p(s|t) 

Standard/MMI_rerank: reranking using different features including p(t|s) and p(s|t)

Attention: MMI reranking for attention models. Folders within Attention are in the same way as in Standard.

data_gr: A sample of training/dev/testing data.

Pipelines

(1) Training p(t|s) and p(s|t)

cd training

run matlab LSTM(1) or Attention(1) to train p(english|german)

run matlab LSTM(0) or Attention(1) to train p(german|english)

(2) generating the N-best list from p(t|s)

cd decode 

run matlab decode()

(3) generating the score of p(s|t)

cd get_s_given_t

(3.a) preparing the data

    python generate_source_target.py 

(3.b) computing p(s|t)

    matlab generate_score()

(d) feature reranking

cd MMI_rerank

Use the open package of MERT. If you don't have mert, you can do simple grid search by running

python tune_bleu.py. 

Monolingual features are not currently not included.

For any related questions, feel free to contact jiweil@stanford.edu

@article{li2016mutual,
  title={Mutual Information and Diverse Decoding Improve Neural Machine Translation},
  author={Li, Jiwei and Jurafsky, Dan},
  journal={arXiv preprint arXiv:1601.00372},
  year={2016}
}