A single-binary, GPU-accelerated LLM server (HTTP and WebSocket API) written in Rust
Primary LanguageRustOtherNOASSERTION