/KV-caching-toy-example

Primary LanguageJupyter NotebookMIT LicenseMIT

KV-caching-toy-example

A minimal toy example of the KV-cache used to speed up large language model transformers using only numpy, composed from an edited version of jaymody/picoGPT#7 with moving diagrams from https://medium.com/@joaolages/kv-caching-explained-276520203249 alongside code annotations

Dependencies

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

START_HERE

A quick breakdown of each of the files:

  • encoder.py contains the code for OpenAI's BPE Tokenizer, taken straight from their gpt-2 repo.
  • utils.py contains the code to download and load the GPT-2 model weights, tokenizer, and hyper-parameters.
  • gpt2.py contains the actual GPT model and generation code which we can run as a python script.