/neural-speed

An innovative library for efficient LLM inference via low-bit quantization

Primary LanguageC++Apache License 2.0Apache-2.0

Watchers