substratusai/kubeai

AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports LLMs, embeddings, and speech-to-text.

GoApache-2.0

Issues

Configure a default timeout of 30 minutes
#58 opened 5 months ago by samos123
2
Support Google TPUs
#156 opened 5 months ago by nstogner
0
Adding model does not check remote model access/authorization
#155 opened 5 months ago by strus38
1
Disable Open WebUI admin panel
#150 opened 5 months ago by nstogner
1
Support accelerator type in resource profile
#152 opened 5 months ago by samos123
2
Modifying kubeai configmap should auto refresh kubeai controller
#151 opened 5 months ago by samos123
0
Connections/Ollama API testing - http://host.docker.internal:11434 is failing
#145 opened 5 months ago by strus38
6
UI for managing models
#148 opened 5 months ago by nstogner
0
Node Selector should not be specified in default Helm values.yaml file
#146 opened 5 months ago by nstogner
0
Build and Publish KubeAI images for ARM64
#134 opened 5 months ago by BOsterbuhr
0
Create a proposal for managing model weights and adapters
#139 opened 5 months ago by nstogner
0
Support PEFT jobs (LoRA)
#138 opened 5 months ago by nstogner
0
Support using existing secret containing Huggingface token
#133 opened 5 months ago by BOsterbuhr
0
Support multi-tenancy in the Chat UI
#136 opened 5 months ago by nstogner
0
Support dynamic LoRA serving
#132 opened 5 months ago by nstogner
0
Support passthrough model access
#125 opened 5 months ago by nstogner
0
Support autoscaling for the Ollama model server
#123 opened 5 months ago by nstogner
0
Load balancing doesn't seem to spread evenly
#113 opened 5 months ago by samos123
5
chunked prefill cannot be used with prefix caching
#119 opened 6 months ago by samos123
0
Dedup received messages
#118 opened 6 months ago by nstogner
0
lingo messenger crashes causes restart of lingo
#100 opened 9 months ago by samos123
2
Improve scaling behavior when there are requests waiting to be queued
#106 opened 6 months ago by samos123
1
Refactor logging to use library
#79 opened a year ago by samos123
2
Feature request: ability to configure the time window used to calculate the average active requests.
#108 opened 7 months ago by nstogner
0
Messenger integration: Make error backoff configurable
#109 opened 7 months ago by nstogner
0
health check results in unable to parse model error
#105 opened 7 months ago by samos123
1
Add docs on how to use pub/sub integration
#103 opened 7 months ago by samos123
0
Support OpenAI API key based authentication
#101 opened 9 months ago by samos123
0
add flash attention in vLLM helm chart
#99 opened 9 months ago by samos123
0
e2e messenger GCP pubsub system tests
#98 opened 9 months ago by samos123
0
vLLM ocasionally gets into broken state
#96 opened 9 months ago by samos123
0
Restarting lingo should not cause instant scale downs
#95 opened 9 months ago by samos123
0
Messenger: Panic panic: Ack/Nack called twice on
#93 opened 9 months ago by samos123
0
Messenger: Log the metadata of each message
#92 opened 9 months ago by samos123
0
Bucket integration
#90 opened 10 months ago by nstogner
0
Batch support through Pub/Sub
#86 opened 10 months ago by samos123
8
Retry failures when consuming requests via messaging integration
#89 opened 10 months ago by nstogner
0
lingo ha mode failed to aggregate stats
#87 opened 10 months ago by samos123
0
Models endpoint
#57 opened a year ago by nstogner
0
Support Streaming
#56 opened 10 months ago by nstogner
2
Expose vLLM metrics through lingo
#85 opened 10 months ago by samos123
0
CNCF TAG-Runtime or/and CNCF Cloud Native WG discussion
#84 opened a year ago by raravena80
2
Service name shouldn't have to match deployment name
#59 opened a year ago by samos123
9
Only get stats and autoscale deployments with model annotation
#77 opened a year ago by samos123
0
Scale to 0 not working with replicas 3
#73 opened a year ago by samos123
0
Customizable codebase
#63 opened a year ago by nstogner
2
Flapping scale from 0 to 1 to 0 to 1
#67 opened a year ago by samos123
1
Lingo not leader even though there is only 1 replica after long pod uptime
#60 opened a year ago by samos123
5
Makefile has incorrect rules, running `make test` causes error
#68 opened a year ago by samos123
0
Race: make-race failing on local machine
#53 opened a year ago by nstogner
0