James-QiuHaoran/IntelliLLM

An intelligent LLM serving system based on ML-driven scheduling, load-balancing, request migration and preemption, and KV cache management for high throughput, low latency, and fault tolerance.

PythonApache-2.0

Watchers

eemailme