A minimal toy example of the KV-cache used to speed up large language model transformers using only numpy, composed from an edited version of jaymody/picoGPT#7 with moving diagrams from https://medium.com/@joaolages/kv-caching-explained-276520203249 alongside code annotations
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
A quick breakdown of each of the files:
encoder.py
contains the code for OpenAI's BPE Tokenizer, taken straight from their gpt-2 repo.utils.py
contains the code to download and load the GPT-2 model weights, tokenizer, and hyper-parameters.gpt2.py
contains the actual GPT model and generation code which we can run as a python script.