GoogleCloudPlatform/ai-on-gke

Autopilot e2e tests are flaky due to GMP webook

Closed this issue · 0 comments

Sometimes the Autopilot e2e tests fail because GMP webhook:

Error: Internal error occurred: failed calling webhook "default.podmonitorings.gmp-operator.gke-gmp-system.monitoring.googleapis.com": failed to call webhook: Post "https://gmp-operator.gke-gmp-system.svc:443/default/monitoring.googleapis.com/v1/podmonitorings?timeout=10s": No agent available

  with module.kuberay-monitoring[0].helm_release.gmp-engine,
  on ../../modules/kuberay-monitoring/main.tf line 16, in resource "helm_release" "gmp-engine":
  16: resource "helm_release" "gmp-engine" {

My guess is that this is because the Autopilot cluster has no nodes initially so the webhook it can't serve this request.