Saving Keys and Values Cache at lower precision

Question

Ayushk4 opened this issue 2 years ago · 0 comments

Refer https://github.com/FMInference/FlexGen - they have explored storing cache at 4-bit quantization.