SqueezeAILab/KVQuant
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Python
Issues
- 0
How to reproduce the table 19 (kvquant vs kivi)
#14 opened by condy0919 - 2
- 0
Coupled Channel-wise Quantization
#12 opened by naston - 0
Would the current implementation of Fisher Information work out of the box with Multi-head Latent Attention
#11 opened by naston - 1
PRE-ROPE quantization during inference
#1 opened by minghaoBD - 1
- 0
- 0
Question about storage
#8 opened by mlxht990720 - 0
- 1
problem when reproduce experiment
#5 opened by cat538 - 1
- 1
- 1