The Transformer architecture, introduced by Google researchers in the paper "Attention Is All You Need" (2017), is a key innovation in deep learning for natural language processing (NLP). It is the foundational model for many advanced NLP applications.
Models like ChatGPTdeveloped by OpenAI, are built on variations of the Transformer architecture. While ChatGPT is based on the GPT (Generative Pre-trained Transformer) model, which extends the Transformer architecture, it incorporates additional techniques and training strategies to enhance its performance.
Transformer is a neural network architecture that performs sequence to sequence learning:
To run the code you just need to have python 3.10 installed in your system. Then run following command:
make all
It will download the data and run the training.