GoogleCloudPlatform/ai-on-gke
AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kubernetes Engine
HCLApache-2.0
Issues
- 0
Service Management API has not been used in project when creating playground
#700 opened by laurentgrangeau - 0
- 3
Kueue/DWS in helm chart
#688 opened by krisero - 2
Enable Image Streaming by default
#509 opened by andrewsykim - 1
- 0
Configure TPU Provisioner tests to run in CI
#659 opened by danielvegamyhre - 0
- 2
- 4
oauth2/google: invalid token JSON from metadata: EOF
#634 opened by kzos - 0
Remove 60s sleep waiting for GMP
#502 opened by andrewsykim - 0
JupyterHub fails to start with proxy 599 timeout
#565 opened by artemvmin - 2
Error 400: Autopilot clusters must be regional clusters
#628 opened by kzos - 1
- 1
RAG frontend is getting OOMKilled
#518 opened by andrewsykim - 0
- 0
Resolve SQLAlchemy warning from RAG notebook
#555 opened by andrewsykim - 0
- 1
- 0
VPC networks cannot be deleted for RAG deployments if a private connection to CloudSQL is established
#550 opened by andrewsykim - 1
- 0
RAG cannot be deployed on an existing cluster due to CloudSQL requiring at least 1 private service connection
#529 opened by andrewsykim - 0
- 1
- 0
Auto enable APIs in the infrastructure module
#474 opened by andrewsykim - 2
Update the README to reflect the correct sample dataset & surface this info earlier in the README
#438 opened by kumar-dhanagopal - 0
CLOUDSQL_INSTANCE_CONNECTION_NAME env in RayCluster should only be configured for RAG deployments
#435 opened by andrewsykim - 0
Only set CLOUDSQL_INSTANCE_CONNECTION_NAME env var in Ray clusters when deploying RAG
#424 opened by andrewsykim - 0
Autopilot e2e tests are flaky due to GMP webook
#402 opened by andrewsykim - 0
Jupyter authenticator: Path requires update
#415 opened by hsachdevah - 1
- 0
Document that RayClusters must be labeled with `app.kubernetes.io/name: kuberay` for kuberay-tpu-webhook
#308 opened by davidxia - 1
- 0
- 0
Broken links - TPU training tutorial
#343 opened by brandonroyal - 1
Remove dependency on GPUs in CI
#338 opened by andrewsykim - 0
- 0
[Feature] Add Dynamic Workload Scheduler example to GKE Batch Reference Architecture
#234 opened by alizaidis - 0
gke-disk-image-builder: Configurable service account
#166 opened by nstogner - 0
[GKE Batch Reference Architecture] Update Kaniko executor images in Cloud Build create and destroy steps.
#208 opened by alizaidis - 0
[GKE Batch Reference Architecture] Update versions of google and google-beta Terraform providers.
#206 opened by alizaidis - 0
- 0
gke-disk-image-builder: Configurable instance network settings with limited permissions
#169 opened by nstogner - 0
- 0
- 0
gke-disk-image-builder: Configurable instance tags
#167 opened by nstogner - 0
Add reference architecture for a batch processing platform on GKE using Kueue
#149 opened by alizaidis - 0
Kuberay TPU Webhook
#114 opened by ryanaoleary - 0
- 0
Pre-submit check for file License Headers
#106 opened by chiayi - 0
Tutorial: Finetuning Llama 7b on GKE using L4 GPUs invalid gcloud container clusters create
#92 opened by dooskin