nanoTransformer

A nano implementation for the transformer from the Attention All You Need paper (Vaswani et al. (2017)).

The model uses character tokens rather than word tokens.

It is trained to perform english to french translation.

The progress of the training and validation losses after training for 50k iterations is depicted below: