A serious server implementation of LlamaCPP. OpenAI-compatible API, queue, & scaling. Embed a production level, local inference engine in your apps.
Primary LanguageC++GNU Affero General Public License v3.0AGPL-3.0
No issues in this repository yet.