substratusai/kubeai
AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports LLMs, embeddings, and speech-to-text.
GoApache-2.0
Issues
- 2
Configure a default timeout of 30 minutes
#58 opened by samos123 - 0
Support Google TPUs
#156 opened by nstogner - 1
- 1
Disable Open WebUI admin panel
#150 opened by nstogner - 2
Support accelerator type in resource profile
#152 opened by samos123 - 0
- 6
- 0
UI for managing models
#148 opened by nstogner - 0
- 0
Build and Publish KubeAI images for ARM64
#134 opened by BOsterbuhr - 0
- 0
Support PEFT jobs (LoRA)
#138 opened by nstogner - 0
- 0
Support multi-tenancy in the Chat UI
#136 opened by nstogner - 0
Support dynamic LoRA serving
#132 opened by nstogner - 0
Support passthrough model access
#125 opened by nstogner - 0
Support autoscaling for the Ollama model server
#123 opened by nstogner - 5
Load balancing doesn't seem to spread evenly
#113 opened by samos123 - 0
chunked prefill cannot be used with prefix caching
#119 opened by samos123 - 0
Dedup received messages
#118 opened by nstogner - 2
lingo messenger crashes causes restart of lingo
#100 opened by samos123 - 1
- 2
Refactor logging to use library
#79 opened by samos123 - 0
Feature request: ability to configure the time window used to calculate the average active requests.
#108 opened by nstogner - 0
- 1
health check results in unable to parse model error
#105 opened by samos123 - 0
Add docs on how to use pub/sub integration
#103 opened by samos123 - 0
Support OpenAI API key based authentication
#101 opened by samos123 - 0
add flash attention in vLLM helm chart
#99 opened by samos123 - 0
e2e messenger GCP pubsub system tests
#98 opened by samos123 - 0
vLLM ocasionally gets into broken state
#96 opened by samos123 - 0
- 0
Messenger: Panic panic: Ack/Nack called twice on
#93 opened by samos123 - 0
Messenger: Log the metadata of each message
#92 opened by samos123 - 0
Bucket integration
#90 opened by nstogner - 8
Batch support through Pub/Sub
#86 opened by samos123 - 0
- 0
lingo ha mode failed to aggregate stats
#87 opened by samos123 - 0
Models endpoint
#57 opened by nstogner - 2
Support Streaming
#56 opened by nstogner - 0
Expose vLLM metrics through lingo
#85 opened by samos123 - 2
- 9
- 0
- 0
Scale to 0 not working with replicas 3
#73 opened by samos123 - 2
Customizable codebase
#63 opened by nstogner - 1
Flapping scale from 0 to 1 to 0 to 1
#67 opened by samos123 - 5
- 0
- 0
Race: make-race failing on local machine
#53 opened by nstogner