🧠 Build a Tiny GPT from Scratch

An educational, minimalist implementation of GPT-style language models in PyTorch, following Andrej Karpathy’s video walkthrough. This repo includes both a Bigram Language Model and a Transformer-based GPT, modularized for clarity and extensibility.

🔍 Overview

This project walks through building a GPT-style model from scratch, with the following components:

🧱 A Bigram Language Model for foundational understanding
🔁 A Transformer-based GPT with:
- Tokenizer
- Self-attention mechanism
- LayerNorm & MLP
- Training loop with cross-entropy loss
📚 Trained and evaluated on the Shakespeare dataset using Google Colab

Check out the Colab notebooks to run and experiment with the models.

📝 Companion Article

I also wrote a companion article (in both English and Chinese) exploring how different modeling approaches tackle the core challenges of language modeling:

📄 Language Model 01: From Bigram, N-gram to GPT (PDF)

🚀 Getting Started

Run the notebooks directly on Google Colab:

Make sure to install PyTorch in your environment or use Colab for convenience.

🙏 Acknowledgement

This project closely follows Andrej Karpathy's GPT tutorial. All credit for the original idea and inspiration goes to him. My goal here is to internalize and extend the knowledge through hands-on implementation and reflection.

📄 License

This project is released under the MIT License, with proper attribution to Andrej Karpathy as the source of the tutorial and base code structure. Please refer to his original repo if you intend to use this work commercially or academically.