/neural-speed

A library for efficient LLM inference based on SOTA low-bit quantization and sparsity

Primary LanguageC++

No issues in this repository yet.