tuformer

To run tuformers, register tuformer module to any Transformer related architectures.

For example, to perform NMT task on tuformer version of iwslt_de_en model, run commands below. Other setup remains the same.

CUDA_VISIBLE_DEVICES=0 fairseq-train
data-bin/iwslt14.tokenized.de-en
--arch tuformer_iwslt_de_en --share-decoder-input-output-embed
--optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0
--lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000
--dropout 0.3 --weight-decay 0.0001
--criterion label_smoothed_cross_entropy --label-smoothing 0.1
--max-tokens 4096
--eval-bleu
--eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}'
--eval-bleu-detok moses
--eval-bleu-remove-bpe
--eval-bleu-print-samples
--best-checkpoint-metric bleu --maximize-best-checkpoint-metric