Implementation of the paper
Balancing Training for Multilingual Neural Machine Translation
Xinyi Wang, Yulia Tsvetkov, Graham Neubig
The preprocessed and binarized data for fairseq can be downloaded here
To process data from scrach, see the script
util_scripts/prepare_multilingual_data.sh
The training scripts for many-to-one translation of the related language group (Related M2O) is under the directory job_scripts/related_ted8_m2o/
.
Our methods:
MultiDDS-S:
job_scripts/related_ted8_m2o/multidds_s.sh
MultiDDS:
job_scripts/related_ted8_m2o/multidds.sh
Baselines:
Proportional:
job_scripts/related_ted8_m2o/proportional.sh
Temperature:
job_scripts/related_ted8_m2o/temperature.sh
The scripts for Related O2M is under the directory job_scripts/related_ted8_o2m/
The scripts for Diverse M2O is under the directory job_scripts/diverse_ted8_m2o/
The scripts for Diverse O2M is under the directory job_scripts/diverse_ted8_o2m/
Each of the experiment script directory contains a trans.sh file to translate the test set. To translate the test set for the Related M2O MultiDDS-S
job_scripts/related_ted8_m2o/trans.sh checkpoints/related_ted8_m2o/multidds_s/
To translate other experiment, simply replace the argument with the experiment checkpoint directory.
Please cite as:
@inproceedings{wang2020multiDDS,
title = {Balancing Training for Multilingual Neural Machine Translation},
author = {Xinyi Wang, Yulia Tsvetkov, Graham Neubig},
booktitle = {ACL},
year = {2020},
}