pi-tau/transformer
The Transformer model implemented from scratch using PyTorch. The model uses weight sharing between the embedding layers and the pre-softmax linear layer. Training on the Multi30k machine translation task is shown.
Python
Issues
- 1
Embedding scaling
#1 opened by jamesanto