Implement kv cache
certik opened this issue · 0 comments
certik commented
Here is some information what kv cache is: https://kipp.ly/blog/transformer-inference-arithmetic/#kv-cache
Roughly speaking, when new tokens are added at the end of the input and new token is generated, a lot of the computation could be reused from the previous iteration. We need to cache the results and reused them.
Here is a reference implementation in picoGPT: jaymody/picoGPT#7 (and the accompanying blog post https://immortal3.github.io/becoming-the-unbeatable/posts/gpt-kvcache/) that should be straightforward to adapt.