NolanoOrg/cformers

Saving Keys and Values Cache at lower precision

Ayushk4 opened this issue · 0 comments

Refer https://github.com/FMInference/FlexGen - they have explored storing cache at 4-bit quantization.