/training-of-transformer-on-dummy-data

Here we try to understand how transformer works and try to replicate architecture from paper published. Also we will train simple architecture on dummy dataset.

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

Stargazers