James-QiuHaoran/IntelliLLM
An intelligent LLM serving system based on ML-driven scheduling, load-balancing, request migration and preemption, and KV cache management for high throughput, low latency, and fault tolerance.
PythonApache-2.0
An intelligent LLM serving system based on ML-driven scheduling, load-balancing, request migration and preemption, and KV cache management for high throughput, low latency, and fault tolerance.
PythonApache-2.0