An educational, minimalist implementation of GPT-style language models in PyTorch, following Andrej Karpathyβs video walkthrough. This repo includes both a Bigram Language Model and a Transformer-based GPT, modularized for clarity and extensibility.
This project walks through building a GPT-style model from scratch, with the following components:
- π§± A Bigram Language Model for foundational understanding
- π A Transformer-based GPT with:
- Tokenizer
- Self-attention mechanism
- LayerNorm & MLP
- Training loop with cross-entropy loss
- π Trained and evaluated on the Shakespeare dataset using Google Colab
Check out the Colab notebooks to run and experiment with the models.
I also wrote a companion article (in both English and Chinese) exploring how different modeling approaches tackle the core challenges of language modeling:
π Language Model 01: From Bigram, N-gram to GPT (PDF)
Run the notebooks directly on Google Colab:
Make sure to install PyTorch in your environment or use Colab for convenience.
This project closely follows Andrej Karpathy's GPT tutorial. All credit for the original idea and inspiration goes to him. My goal here is to internalize and extend the knowledge through hands-on implementation and reflection.
This project is released under the MIT License, with proper attribution to Andrej Karpathy as the source of the tutorial and base code structure. Please refer to his original repo if you intend to use this work commercially or academically.