Attention Is All You Need

Attention is All You Need PyTorch implementation based on Umar Jamil's Coding a Transformer from Scratch video.

Getting Started

To set up and run this project:

Create a new environment with the provided requirements.txt file:

virtualenv venv
source venv/bin/activate
pip3 install -r requirements.txt

wandb login

Start model training

python train.py

Optionally, redirect Python output to a log file:

python train.py > log.txt

The default hyperparameters are defined in config.py and can be modified as needed.

Model card (for English-Italian)

Vanilla Transformer Model
Batch size: 8
d_model: 512
seq_len: 350
lr = 10e-4
Optimizer: Adam

Model card (for English-Russian)

Vanilla Transformer Model
Batch size: 16
d_model: 512
seq_len: 250
lr = 10e-4
Optimizer: Adam

The model was initially trained for English-to-Italian and English-to-Russian translations but with limited resources. The author trained these models for 20 epochs each; you can continue training from the provided checkpoints if needed. The logs for the training can be found here

Resume Training

To resume training from a checkpoint:

Create a weights directory in the root folder.

mkdir weights

Download the model weights from this link and place them in the weights directory.
The directory structure should resemble:

tree weights/
weights/
├── tmodel_00.pt
├── tmodel_01.pt
├── tmodel_02.pt
...
├── tmodel_18.pt
└── tmodel_19.pt

0 directories, 20 files

Continue training as usual.

python train.py

Demo App

This implementation features a demo app built on Streamlit for model inference. Please ensure that you have correctly placed the appropriate model weights in the weights folder. For instance, if you intend to utilize the English-to-Russian model, download the corresponding weights and place them in the weights folder. Once the weights are correctly situated, you can initiate model inference by executing: