transformer-experiments: A Python repository from Mekadrom

This project is originally based on a-PyTorch-Tutorial-to-Transformers. I have made heavy modifications involving hyperparameter configuration and implementing various methods to improve the performance of the model from other papers. Papers involved include:

Attention Is All You Need
Lessons on Parameter Sharing across Layers in Transformers
RoFormer: Enhanced Transformer with Rotary Position Embedding
ReZero is All You Need: Fast Convergence at Large Depth
On Layer Normalization in the Transformer Architecture
Understanding the Difficulty of Training Transformers

Results are documented in a spreadsheet here.

Example tensorboard graph of validation loss between the different runs:

Mekadrom/transformer-experiments