KV-caching-toy-example

A minimal toy example of the KV-cache used to speed up large language model transformers using only numpy, composed from an edited version of jaymody/picoGPT#7 with moving diagrams from https://medium.com/@joaolages/kv-caching-explained-276520203249 alongside code annotations

python3 -m venv venv
source venv/bin/activate

pip install -r requirements.txt

A quick breakdown of each of the files:

encoder.py contains the code for OpenAI's BPE Tokenizer, taken straight from their gpt-2 repo.
utils.py contains the code to download and load the GPT-2 model weights, tokenizer, and hyper-parameters.
gpt2.py contains the actual GPT model and generation code which we can run as a python script.

clam004/KV-caching-toy-example