/dlcl

The implementation of "Learning Deep Transformer Models for Machine Translation"

Primary LanguagePythonOtherNOASSERTION

Learning Deep Transformer Models for Machine Translation on Fairseq

The implementation of Learning Deep Transformer Models for Machine Translation [ACL 2019] (Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F. Wong, Lidia S. Chao)

This code is based on Fairseq v0.5.0

Installation

  1. pip install -r requirements.txt
  2. python setup.py develop
  3. python setup.py install

NOTE: test in torch==0.4.1

Prepare Training Data

  1. Download the preprocessed WMT'16 En-De dataset provided by Google to project root dir

  2. Generate binary dataset at data-bin/wmt16_en_de_google

bash runs/prepare-wmt-en2de.sh

Train

Train deep pre-norm baseline (20-layer encoder)

bash runs/train-wmt-en2de-deep-prenorm-baseline.sh

Train deep post-norm DLCL (25-layer encoder)

bash runs/train-wmt-en2de-deep-postnorm-dlcl.sh

Train deep pre-norm DLCL (30-layer encoder)

bash runs/train-wmt-en2de-deep-prenorm-dlcl.sh

NOTE: BLEU will be calculated automatically when finishing training

Results

Model #Param. Epoch* BLEU
Transformer (base) 65M 20 27.3
Transparent Attention (base, 16L) 137M - 28.0
Transformer (big) 213M 60 28.4
RNMT+ (big) 379M 25 28.5
Layer-wise Coordination (big) 210M* - 29.0
Relative Position Representations (big) 210M 60 29.2
Deep Representation (big) 356M - 29.2
Scailing NMT (big) 210M 70 29.3
Our deep pre-norm Transformer (base, 20L) 106M 20 28.9
Our deep post-norm DLCL (base, 25L) 121M 20 29.2
Our deep pre-norm DLCL (base, 30L) 137M 20 29.3

NOTE: * denotes approximate values.