/IntelliLLM

An intelligent LLM serving system based on ML-driven scheduling, load-balancing, request migration and preemption, and KV cache management for high throughput, low latency, and fault tolerance.

Primary LanguagePythonApache License 2.0Apache-2.0

Watchers