A high-performance inference system for large language models, designed for production environments.
Primary LanguageC++Apache License 2.0Apache-2.0