Code to train model from "ENGINE:Energy-Based Inference Networks for Non-Autoregressive Machine Translation", accept by ACL2020. Our code is based on the code of Mask-Predict from Facebook AI Research.
We propose to train a non-autoregressive machine translation model to minimize the energy defined by a pretrained autoregressive model.In particular, we view our non-autoregressive translation system as an inference network (Tu and Gimpel, 2018) trained to minimize the autoregressive teacher energy. This contrasts with the popular approach of training a non-autoregressive model on a distilled corpus consisting of the beam-searched outputs of such a teacher model. Our approach, which we call ENGINE (ENerGy-based Inference NEtworks), achieves state-of-the-art non-autoregressive results on the IWSLT 2014 DE-EN and WMT 2016 RO-EN datasets, approaching the performance of autoregressive models.
The pretrained AR (autogressive) transformer models, the dictionaries and CMLM (conditional masked language models) could be download. These models are used for ENGINE training. By the way, the scripts to train these model are also provides in the end.
And let me know if you are interested in the ENGINE models trained in the paper. I do not share the models here because of disk space limitation.
With the following commands, we can download the pretrained autoregressive model and CMLM model that used in this paper.
- IWSLT14 German-English
wget https://ttic.uchicago.edu/~lifu/ENGINE_IWSLT14DEEN.zip
- WMT16 Romanian-English
wget https://ttic.uchicago.edu/~lifu/ENGINE_WMT16ROEN.zip
- IWSLT'14 German to English (DE-EN) can be obtained with scripts provided in fairseq.
- WMT'16 Romania to English (RO-EN) can be obtained from here.
Here are two examples how to preprocess the datasets. Please check the scripts in detail. Note that, in our project, we also preprocess the pseudocorpus. The beam search outputs from the pretrained autoregressive models are used in this project.
- IWSLT14 German-English
sh preprocess_iwslt14.sh
- IWSLT14 German-English
sh preprocess.sh
train_ENGINE.sh
or train_ENGINE_iwslt.sh
is used for ENGINE training:
mkdir -p $output_dir; cp ENGINE_WMT16ROEN/$Pretrained_cmlm $output_dir/checkpoint_last.pt
python train_inf.py \
PATH_YOUR_OUTPUT/$text --always-mask --energy_file ENGINE_WMT16ROEN/$energy \
--infnet-toE-feed-type 0 --feed-type 0 --arch bert_transformer_seq2seq
--share-all-embeddings --update-freq 8 --reset-optimizer --criterion Inf_Energy_Loss \
--label-smoothing 0.1 --lr 0.000001 --alpha 0 --optimizer adam --adam-betas '(0.9, 0.999)'
--task translation_inf --max-tokens 1024 --weight-decay 0.01 --dropout 0.1 \
--encoder-layers 6 --encoder-embed-dim 512 --decoder-layers 6 --decoder-embed-dim 512 \
--max-source-positions 10000 --max-target-positions 10000 --max-update 300000 --seed 0 \
--save-dir $output_dir
one decoding iteration is used ( meaning the methods are purely non-autoregressive )
python generate_cmlm.py ${output_dir}/data-bin --path ${model_dir}/checkpoint_best.pt --task translation_self --remove-bpe --max-sentences 20 --decoding-iterations 1 --decoding-strategy mask_predict
The following scrips could be used to train your own models: the pretrained AR( autogressive ) transformer models and CMLM.
You could use the script train_AR_Energy.sh
or train_iwslt_AR_Energy.sh
to train your owm transformer AR( autogressive ) energy models.
The script train_CMLM.sh
shows an example how to train CMLM.
For any issues or thoughts about ENGINE in the Github, please contact Lifu Tu ( lifu@ttic.edu
).
The code version of OpenNMT-py for ENGINE will be also provided soon.
@inproceedings{tu-2020-nat,
title = {{ENGINE}: Energy-Based Inference Networks for Non-Autoregressive Machine Translation},
author = "Tu, Lifu and Pang, Richard Yuanzhe and Wiseman, Sam and Gimpel, Kevin",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
year = "2020",
publisher = "Association for Computational Linguistics",
}