Azure/AKS

[BUG] ImageSet degrading Calico's tigera installation

Closed this issue · 2 comments

Describe the bug
AKS deployed ImageSet is degrading Calico's tigera installation and preventing proper installation reconciliation.

Current tigerastatus:
Image

Error is caused by ImageSet:
Image

This is the current ImageSet deployed by AKS:

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: operator.tigera.io/v1
kind: ImageSet
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"operator.tigera.io/v1","kind":"ImageSet","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile"},"name":"calico-v3.26.3"},"spec":{"images":[{"digest":"sha256:c4d73d9636834e81da5a1a9d914f5967326da18aa91f268b78df151deee82a58","image":"calico/cni"},{"digest":"sha256:3d1c7be4e4259c8bb2ac582669d6905c977a11e59b5eb510933d15fd7b124987","image":"calico/kube-controllers"},{"digest":"sha256:79afaa3426e573c1ba42f6c55198ae4311422149c4939c87eb6c0d90d6e9609a","image":"calico/node"},{"digest":"sha256:db15c177c804ddc7c870cc72bc37a36499af3c744cb633714ea1911eba14433e","image":"calico/typha"},{"digest":"sha256:688277437bb230895286ad2cf5b827d5b1c8a850ad87f912dffca3c715b1a960","image":"calico/pod2daemon-flexvol"},{"digest":"sha256:b159e24deff1fe31b57d50f9c459a5ec91b4ff9fa34d8a18e2bd421c2b58d9ac","image":"calico/apiserver"},{"digest":"sha256:64765b0edd5ee3f62ccdd69b57ae389c57e23eaa8a5b1bfa0cbc446f4d66f6b8","image":"calico/csi"},{"digest":"sha256:411409512c6569aa3a29d6ee8fba6d545fc37dc9fe5b6a3117fe8887cc4252fc","image":"calico/node-driver-registrar"},{"digest":"sha256:411409512c6569aa3a29d6ee8fba6d545fc37dc9fe5b6a3117fe8887cc4252fc","image":"calico/windows-upgrade"}]}}
  creationTimestamp: "2024-07-06T11:09:53Z"
  generation: 1
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
  name: calico-v3.26.3
  resourceVersion: "2673069"
  uid: 3d0fc51e-b581-4056-bcf4-fb6e2f05ced0
spec:
  images:
  - digest: sha256:c4d73d9636834e81da5a1a9d914f5967326da18aa91f268b78df151deee82a58
    image: calico/cni
  - digest: sha256:3d1c7be4e4259c8bb2ac582669d6905c977a11e59b5eb510933d15fd7b124987
    image: calico/kube-controllers
  - digest: sha256:79afaa3426e573c1ba42f6c55198ae4311422149c4939c87eb6c0d90d6e9609a
    image: calico/node
  - digest: sha256:db15c177c804ddc7c870cc72bc37a36499af3c744cb633714ea1911eba14433e
    image: calico/typha
  - digest: sha256:688277437bb230895286ad2cf5b827d5b1c8a850ad87f912dffca3c715b1a960
    image: calico/pod2daemon-flexvol
  - digest: sha256:b159e24deff1fe31b57d50f9c459a5ec91b4ff9fa34d8a18e2bd421c2b58d9ac
    image: calico/apiserver
  - digest: sha256:64765b0edd5ee3f62ccdd69b57ae389c57e23eaa8a5b1bfa0cbc446f4d66f6b8
    image: calico/csi
  - digest: sha256:411409512c6569aa3a29d6ee8fba6d545fc37dc9fe5b6a3117fe8887cc4252fc
    image: calico/node-driver-registrar
  - digest: sha256:411409512c6569aa3a29d6ee8fba6d545fc37dc9fe5b6a3117fe8887cc4252fc
    image: calico/windows-upgrade

This results in error with the tigera-operator such as:

{"level":"error","ts":"2024-10-18T15:39:33Z","msg":"Reconciler error","controller":"tigera-installation-controller","object":{"name":"periodic-5m0s-reconcile-event"},"namespace":"","name":"periodic-5m0s-reconcile-event","reconcileID":"c536ee6c-1c1d-4df8-932a-663ec508de15","error":"ImageSets exist but none with the expected name calico-v3.28.1","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.3/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.3/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.3/pkg/internal/controller/controller.go:226"}

Which will also prevent proper reconciliation of installations.operator.tigera.io (e.g.: enabling typha metrics does not work under these conditions, works usually).

ImageSet was potentially deployed as part of the following issue: #4280 . If that is the case, then the fix is potentially impacting clients at large for a smaller issue specific to arm64 system pool.

To Reproduce
Steps to reproduce the behavior:

  1. Run kubectl get imageset and notice the 3.26.3 imageset
  2. Run kubectl describe tigerastatuses.operator.tigera.io calico and notice the degraded status caused by the ImageSet

Expected behavior
ImageSet does not exist or point to proper version which allows tigera-operator to function normally and properly reconcile installations.

Environment (please complete the following information):

  • CLI Version: v1.31.0
  • Kubernetes version: v1.30.3

Additional context

Deleting the ImageSet will resolve the issue and upgrade Calico to 3.28.1, however there is no way for me to tell if the latest version could cause other issues with the current AKS versions, since there is likely a reason this ImageSet was deployed in the first place.

Also note that this problem is occurring on 4 of our clusters, not specific to one.

A change to purge that imageset is going out with what should be a 10/22 release but is really whatever follows
10/06 release on https://releases.aks.azure.com/

You should feel free/encouraged to delete this imageset if you see it in any cluster. The fact that it wasn't deleted on the 3.26.3 is due to CR's not being included in our addon managers prune list.

Thank you for that, will delete the ImageSet.