Pre-train the dataset of Shakespeare poem to make character level language model using decoder only transformer.
Implemented Multihead attention transformer from the scratch rather than just using any pre-built transformer.
input.txt contain the dataset of Shakespeare poem dataset.
Bigram.ipynb file is to understand the basic bi-gram level LM model.
Run the gpt.py for pre-training.