An innovative library for efficient LLM inference via low-bit quantization
Primary LanguageC++Apache License 2.0Apache-2.0