Saving Keys and Values Cache at lower precision
Ayushk4 opened this issue · 0 comments
Ayushk4 commented
Refer https://github.com/FMInference/FlexGen - they have explored storing cache at 4-bit quantization.
Ayushk4 opened this issue · 0 comments
Refer https://github.com/FMInference/FlexGen - they have explored storing cache at 4-bit quantization.