SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Primary LanguagePythonMIT LicenseMIT