This is the code we used in our paper
Multilingual Neural Machine Translation with Soft Decoupled Encoding
Xinyi Wang, Hieu Pham, Philip Arthur, Graham Neubig
Python 3.6, PyTorch 0.4.1
All the scripts for experiments in the paper can be created from the templates under scripts/template/
The data we use is multilingual TED corpus by Qi et al.
We provide preprocessed version of the data, which you can get from here:
If you are interested int the details of data processing, you can take a look at the script make-eng.sh
and make-data.sh
.
The template name for the following methods are:
- SDE: bi-semb-bq-o32000
- subword: bi-sw-32000
- subword-joint: bi-sw-joint-32000
- word: bi-w-64000
To make the main experiment scripts for alll 4 languages tested in the paper, simply call
bash make-cfg.sh
To make decode scripts, simply use the file make-trans.py. Change the name of the directory where the experiment outputs are stored if you modify the template scripts during training. Otherwise it should just work by calling:
python make-trans.py
If you are interested in the implementation of SDE: All the components of SDE is implemented in a encoder class here. It is a RNN encoder that encodes words using SDE.