Primary LanguagePython
The code is based on Andrej Karpathy's video on GPT2: Let's reproduce GPT-2 (124M)