/119_dls2_nmt

Primary LanguagePythonMIT LicenseMIT

Neural Machine Translation

  • Transformer model for Neural Machine Translation from Russian to English
  • PyTorch implementation of "Attention Is All You Need" by Ashish Vaswani et al. (link)
  • Implementation of the method for pruning of attention heads by Elena Voita et al. (link)

Data for training

  • Dataset: OpenSubtitles v2018 (link)
  • Total number of sentence pairs in corpus: ~26M
  • Number of sentence pairs in training: ~9M / ~250K / ~250K

Architecture

  • 6 layers, 8 heads, ~26M parameters
  • max len = 100, hid dim = 256, pf dim = 512

Tokenization

  • Fast Byte Pair Encoding (youtokentome)
  • Source / target vocabulary size: 30k / 20k tokens

Training

  • Dropout 0.1, gradient clipping 1
  • Loss function: Cross Entropy Loss, Target Metric: BLEU
  • 10 epochs, ~25 hours on Kaggle and Google Colab GPUs
  • Loader: 128 sentence pairs / batch, 100 batches / chunk
  • Optimizer: Adam, lr 0.0005, cosine schedule, warm up 70K steps

Pruning

  • λ = 0.05, β = 0.50, 70k iterations
  • Gumbel noise, Hard Concrete Gates
  • Extra penalty for too many attention heads

Inference

  • Greedy and Beam Search generation (k=4)

You can download and run it yourself or you can use this bot.

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
gdown 1heNu80X8DcTKTx2Od0-EW-6JrkXxk5Ze
gdown 1c4LakbKi7-gbKyAvcoGkJ8Yic16wvJx0
gdown 1I46t9Qgz0NbXjT-EPbogEUYpvGPTc408
python3 bot.py <bot_owner_id> <bot_token>