GoogleCloudPlatform/gke-autoneg-controller

Reconcile error: network endpoint group in a specific zone not found

Opened this issue · 4 comments

We deployed autoneg in one of our clusters running GKE Autopilot. When running a workload on it and if it's scheduled to just some specific zones but not all, the NEG are not created in all the zones.

That means that autoneg will fail and stop reconciling.

The actual failing log part:

2021-10-22T17:55:52.092Z	INFO	controllers.Service	Applying intended status	{"service": "envoy/envoy", "status": {"backend_services":{"8000":{"myproduct":{"name":"myproduct","max_connections_per_endpoint":1000}}},"network_endpoint_groups":{"8000":"k8s1-4fd3dc4c-envoy-envoy-8000-44f9746b","8001":"k8s1-4fd3dc4c-envoy-envoy-8001-7508741b"},"zones":["europe-west1-b","europe-west1-c","europe-west1-d"]}}
2021-10-22T17:55:52.762Z	ERROR	controller-runtime.controller	Reconciler error	{"controller": "service", "request": "envoy/envoy", "error": "googleapi: Error 404: The resource 'projects/myproduct-dev/zones/europe-west1-c/networkEndpointGroups/k8s1-4fd3dc4c-envoy-envoy-8000-44f9746b' was not found, notFound"}

The question is, should autoneg tolerate missing network endpoint groups in some but not in all available zones?

rosmo commented

Hey @glerchundi, how have you configured the workload? With anti-affinity for a specific zone? It's a bit interesting, generally I've seen NEGs created on all zones regardless of if a workload is running there, but I suppose this might be an optimization in GKE.

I think I've seen this before - my hypothesis is that the GKE neg controller adds the annotation with the NEG names before they're actually created, and thus autoneg may fail when adding those not-yet-created NEGs to the backendservice.

Question - does this eventually get reconciled? Or does it stuck in a bad state?

Thanks @rosmo & @soellman for your replies!

The workload is configured with a preferred anti affinity on zones but depending on the number of zones that GKE Autopilot has created or the scheduling decisions it took (placing all the pods in the same zone, for example) there could be a possibility to have less number of negs than available zones.

This is eventually fixed as the process of killing, upgrading or whatever reason that makes those pods to be scheduled in different zone will trigger the creation of those missing negs.

At the same time the use of those negs in a backend services prevents them from deletion. Although I don't know if this would ever happen if there aren't.

Hope helps understanding the reasoning behind!

Hey everyone,

We're observing the same issue with out GKE deployment (VPC Native Autopilot cluster), however in our case deployment is simple and doesn't have any anti-affinity configuration. Only simple deployment with Standalone NEG and external global LB.

I was able to reproduce it with as simple deployment as this:

apiVersion: v1
kind: Namespace
metadata:
  name: echo
  labels:
    app.kubernetes.io/name: echo
---
apiVersion: v1
kind: Service
metadata:
  name: echo
  annotations:
    cloud.google.com/neg: '{"exposed_ports": {"80":{"name": "echo"}}}'
    controller.autoneg.dev/neg: '{"backend_services":{"80":[{"name":"echo","max_rate_per_endpoint":100}]}}'
  namespace: echo
spec:
  type: ClusterIP
  selector:
    app: echo
  ports:
    - port: 80
      targetPort: 8080
      name: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: echo
  namespace: echo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: echo
  template:
    metadata:
      labels:
        app: echo
    spec:
      containers:
        - name: echo
          image: ealen/echo-server:0.7.0
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 8080
          env:
            - name: PORT
              value: '8080'

In that case gke annotates services as:

Annotations:       
  cloud.google.com/neg: {"exposed_ports": {"80":{"name": "echo"}}}
  cloud.google.com/neg-status: {"network_endpoint_groups":{"80":"echo"},"zones":["europe-west1-b","europe-west1-c","europe-west1-d"]}
  controller.autoneg.dev/neg: {"backend_services":{"80":[{"name":"echo","max_rate_per_endpoint":100}]}}
  controller.autoneg.dev/neg-status: {"backend_services":{"80":{"echo":{"name":"echo","max_rate_per_endpoint":100}}},"network_endpoint_groups":{"80":"echo"},"zones":["europe-w...

however in reality groups only created in two zones - europe-west1-b and europe-west1-c