/vanilla-transformer

Unofficial Implementation of Transformer from paper "Attention is All You Need"

Primary LanguagePython

Vanilla Transformer

PyTorch implementation of the original transformer paper (Vaswani et al.).



Dataset

english_dataset.txt and spanish_dataset.txt are a selection of sentence pairs from the Tatoeba Project. See the repository Tab-delimited Bilingual Sentence Pairs. We have preprocessed the words to make them lowercase and remove special characters.

Modifications and Comments

  • The position of Layer Normalization matters. Check paper Xiong et al.

TODO

  • Learning rate Scheduler: We are using ADAM with a learning rate of 0.0001.
  • Train and test model in a real dataset.
  • BLEU metric.
  • Beam Search.