DDEle/neural-speed
A library for efficient LLM inference based on SOTA low-bit quantization and sparsity
C++
No issues in this repository yet.
A library for efficient LLM inference based on SOTA low-bit quantization and sparsity
C++
No issues in this repository yet.