GoogleCloudPlatform/ai-on-gke
AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kubernetes Engine
Jupyter NotebookApache-2.0
Issues
- 1
- 0
Include Ray cluster details in outputs.tf
#888 opened by andrewsykim - 4
Terraform in benchmarking has impossible provider requirements
#834 opened by christopher-hendrich-sada - 0
- 2
- 0
- 1
tools/gke-disk-image-builder: Fails with Message: Quota 'CPUS_ALL_REGIONS' exceeded.
#849 opened by katilp - 1
Remove 60s sleep waiting for GMP
#502 opened by andrewsykim - 1
- 1
RAG frontend is getting OOMKilled
#518 opened by andrewsykim - 1
- 2
Update the README to reflect the correct sample dataset & surface this info earlier in the README
#438 opened by kumar-dhanagopal - 1
RAG application leaks DB connection objects
#725 opened by bhlim - 1
Only set CLOUDSQL_INSTANCE_CONNECTION_NAME env var in Ray clusters when deploying RAG
#424 opened by andrewsykim - 0
CLOUDSQL_INSTANCE_CONNECTION_NAME env in RayCluster should only be configured for RAG deployments
#435 opened by andrewsykim - 0
Auto enable APIs in the infrastructure module
#474 opened by andrewsykim - 0
[benchmarks] enable TGI to use Hugging Face DLC
#821 opened by annapendleton - 0
create gke cluster CI is broken
#820 opened by arueth - 0
benchmark locust tool feature request: update locust requests to match LPG requests
#818 opened by annapendleton - 0
Add me-central2 to the list of supported locations.
#802 opened by alaliaa - 0
- 0
Failed Build: Solution Deployment RAG on GKE
#791 opened by HGPai - 0
Add flag to customize metric scrape interval.
#783 opened by raywainman - 0
Error: "POST /generate HTTP/1.1" 404 Not Found when running Locust tool against vLLM model server
#777 opened by Edwinhr716 - 0
ai-on-gke benchmark locust load inference feature request: implement multi-threading on locust worker processes
#768 opened by annapendleton - 0
ai-on-gke benchmark locust tool feature request: run locust worker and master on separate CPU nodes
#767 opened by annapendleton - 0
ai-on-gke benchmark locust load inferencer hits 90%+ cpu usage with master at 200+ users
#766 opened by annapendleton - 0
ERROR: Clonning repo
#760 opened by german-grandas - 0
RAG tf apply fail on AP cluster due to AP not scale up fast enough to deploy GMP
#750 opened by yiyinglovecoding - 0
Service Management API has not been used in project when creating playground
#700 opened by laurentgrangeau - 3
Kueue/DWS in helm chart
#688 opened by krisero - 2
Enable Image Streaming by default
#509 opened by andrewsykim - 1
- 0
Configure TPU Provisioner tests to run in CI
#659 opened by danielvegamyhre - 0
- 2
- 4
oauth2/google: invalid token JSON from metadata: EOF
#634 opened by kzos - 0
JupyterHub fails to start with proxy 599 timeout
#565 opened by artemvmin - 2
Error 400: Autopilot clusters must be regional clusters
#628 opened by kzos - 1
- 0
- 0
Resolve SQLAlchemy warning from RAG notebook
#555 opened by andrewsykim - 1
- 0
VPC networks cannot be deleted for RAG deployments if a private connection to CloudSQL is established
#550 opened by andrewsykim - 1
- 0
RAG cannot be deployed on an existing cluster due to CloudSQL requiring at least 1 private service connection
#529 opened by andrewsykim - 0
- 1
- 0
Autopilot e2e tests are flaky due to GMP webook
#402 opened by andrewsykim - 0
Jupyter authenticator: Path requires update
#415 opened by hsachdevah