redhat-cop/namespace-configuration-operator

With v1.2.5 failed to start - lib64/libc.so.6: version `GLIBC_2.34' not found

1337andre opened this issue · 3 comments

Hi folks,

with image quay.io/redhat-cop/namespace-configuration-operator:v1.2.5

namespace-configuration-operator failed to start

lib64/libc.so.6: version `GLIBC_2.34' not found 
#kubectl describe pod namespace-configuration-operator-67dc8fdbbb-z7xwd
Name:                 namespace-configuration-operator-67dc8fdbbb-z7xwd
Namespace:            default
Priority:             1
Priority Class Name:  low
Service Account:      controller-manager
Node:                 rxcmpk8s19.hcp-infra.blub.de/172.31.102.101
Start Time:           Tue, 14 Nov 2023 08:26:13 +0100
Labels:               app.kubernetes.io/instance=namespace-configuration-operator
                      app.kubernetes.io/name=namespace-configuration-operator
                      control-plane=namespace-configuration-operator
                      pod-template-hash=67dc8fdbbb
Annotations:          cni.projectcalico.org/containerID: 24d87fc3831cd061e868876406b8b5cb1c393f09a49b7a21ec17f6012596ab9c
                      cni.projectcalico.org/podIP: 10.4.192.202/32
                      cni.projectcalico.org/podIPs: 10.4.192.202/32
                      kubectl.kubernetes.io/restartedAt: 2023-09-01T11:43:34+02:00
Status:               Running
IP:                   10.4.192.202
IPs:
  IP:           10.4.192.202
Controlled By:  ReplicaSet/namespace-configuration-operator-67dc8fdbbb
Containers:
  kube-rbac-proxy:
    Container ID:  containerd://7c5362a8759c282136bc91d150d392d3c07dad903a7645a3d2feafcf80c76cb0
    Image:         quay.io/redhat-cop/kube-rbac-proxy:v0.11.0
    Image ID:      quay.io/redhat-cop/kube-rbac-proxy@sha256:c68135620167c41e3d9f6c1d2ca1eb8fa24312b86186d09b8010656b9d25fb47
    Port:          8443/TCP
    Host Port:     0/TCP
    Args:
      --secure-listen-address=0.0.0.0:8443
      --upstream=http://127.0.0.1:8080/
      --logtostderr=true
      --tls-cert-file=/etc/certs/tls/tls.crt
      --tls-private-key-file=/etc/certs/tls/tls.key
      --v=10
    State:          Running
      Started:      Tue, 14 Nov 2023 08:26:15 +0100
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        100m
      memory:     20Mi
    Environment:  <none>
    Mounts:
      /etc/certs/tls from namespace-configuration-operator-certs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ln4t4 (ro)
  namespace-configuration-operator:
    Container ID:  containerd://7773f9892338e07a8f25b4fd2b2c1907598555e5fa442085716ab02f3e12a18c
    Image:         quay.io/redhat-cop/namespace-configuration-operator:v1.2.5
    Image ID:      quay.io/redhat-cop/namespace-configuration-operator@sha256:20debaa7b91aebf034a0bd2baa80d37bac2b23be5c76efc2a6edbc14f942b2b1
    Port:          <none>
    Host Port:     <none>
    Command:
      /manager
    Args:
      --leader-elect
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 14 Nov 2023 08:37:17 +0100
      Finished:     Tue, 14 Nov 2023 08:37:17 +0100
    Ready:          False
    Restart Count:  7
    Requests:
      cpu:        100m
      memory:     20Mi
    Liveness:     http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
    Readiness:    http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from webhook-server-cert (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ln4t4 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  namespace-configuration-operator-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  namespace-configuration-operator-certs
    Optional:    false
  webhook-server-cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  webhook-server-cert
    Optional:    false
  kube-api-access-ln4t4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  14m                   default-scheduler  Successfully assigned default/namespace-configuration-operator-67dc8fdbbb-z7xwd to rxcmpk8s19.hcp-infra.blub.de
  Normal   Pulled     14m                   kubelet            Container image "quay.io/redhat-cop/kube-rbac-proxy:v0.11.0" already present on machine
  Normal   Created    14m                   kubelet            Created container kube-rbac-proxy
  Normal   Started    14m                   kubelet            Started container kube-rbac-proxy
  Normal   Pulling    14m                   kubelet            Pulling image "quay.io/redhat-cop/namespace-configuration-operator:v1.2.5"
  Normal   Pulled     14m                   kubelet            Successfully pulled image "quay.io/redhat-cop/namespace-configuration-operator:v1.2.5" in 6.834695195s (6.834708313s including waiting)
  Normal   Created    13m (x4 over 14m)     kubelet            Created container namespace-configuration-operator
  Normal   Started    13m (x4 over 14m)     kubelet            Started container namespace-configuration-operator
  Normal   Pulled     13m (x3 over 14m)     kubelet            Container image "quay.io/redhat-cop/namespace-configuration-operator:v1.2.5" already present on machine
  Warning  BackOff    4m27s (x52 over 14m)  kubelet            Back-off restarting failed container namespace-configuration-operator in pod namespace-configuration-operator-67dc8fdbbb-z7xwd_default(4d535843-f87d-4e36-86d8-669881710c75)

Failed version 1.2.5 has just been published into operatorHub community-operators
redhat-openshift-ecosystem/community-operators-prod#3593

And so the error is starting to arrive to anyone using the operator through operatorHub:

2023-11-22T09:54:58Z INFO Starting workers {"controller": "namespaceconfig", "controllerGroup": "redhatcop.redhat.io", "controllerKind": "NamespaceConfig", "worker count": 1}
2023-11-22T09:54:58Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": {"name":"basic-user-namespace-monitoring"}}
2023-11-22T09:54:59Z INFO Observed a panic in reconciler: interface conversion: validation.Schema is *validation.schemaValidation, not *validation.NullSchema {"controller": "namespaceconfig", "controllerGroup": "redhatcop.redhat.io", "controllerKind": "NamespaceConfig", "NamespaceConfig": {"name":"basic-user-namespace-monitoring"}, "namespace": "", "name": "basic-user-namespace-monitoring", "reconcileID": "d1c4208b-33eb-444a-941c-d2af11af0852"}
panic: interface conversion: validation.Schema is *validation.schemaValidation, not *validation.NullSchema [recovered]
panic: interface conversion: validation.Schema is *validation.schemaValidation, not *validation.NullSchema
goroutine 209 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:115 +0x1e5
panic({0x22be620?, 0xc0041aa660?})
/opt/hostedtoolcache/go/1.21.4/x64/src/runtime/panic.go:914 +0x21f
github.com/redhat-cop/operator-utils/pkg/util/lockedresourcecontroller.(*LockedResourceManager).validateLockedResources(0xc000854380, {0xc000990800, 0x34, 0xc000637a60?})
/home/runner/go/pkg/mod/github.com/redhat-cop/operator-utils@v1.3.7/pkg/util/lockedresourcecontroller/locked-resource-manager.go:341 +0x6f3
github.com/redhat-cop/operator-utils/pkg/util/lockedresourcecontroller.(*LockedResourceManager).SetResources(0xc000854380, {0xc000990800?, 0x34, 0x40})
/home/runner/go/pkg/mod/github.com/redhat-cop/operator-utils@v1.3.7/pkg/util/lockedresourcecontroller/locked-resource-manager.go:82 +0x77
github.com/redhat-cop/operator-utils/pkg/util/lockedresourcecontroller.(*LockedResourceManager).Restart(0xc000854380, {0x27c5e68, 0xc00075e660}, {0xc000990800, 0x34, 0x40}, {0x3677360, 0x0, 0x0}, 0x0, ...)
/home/runner/go/pkg/mod/github.com/redhat-cop/operator-utils@v1.3.7/pkg/util/lockedresourcecontroller/locked-resource-manager.go:222 +0x16c
github.com/redhat-cop/operator-utils/pkg/util/lockedresourcecontroller.(*EnforcingReconciler).UpdateLockedResourcesWithRestConfig(0xc0000f0f20, {0x27c5e68, 0xc00075e660}, {0x27db750?, 0xc00061a1a0?}, {0xc000990800, 0x34, 0x40}, {0x3677360, 0x0, ...}, ...)
/home/runner/go/pkg/mod/github.com/redhat-cop/operator-utils@v1.3.7/pkg/util/lockedresourcecontroller/enforcing-reconciler.go:117 +0x3bc
github.com/redhat-cop/operator-utils/pkg/util/lockedresourcecontroller.(*EnforcingReconciler).UpdateLockedResources(...)
/home/runner/go/pkg/mod/github.com/redhat-cop/operator-utils@v1.3.7/pkg/util/lockedresourcecontroller/enforcing-reconciler.go:91
github.com/redhat-cop/namespace-configuration-operator/controllers.(*NamespaceConfigReconciler).Reconcile(0xc0000f0f20, {0x27c5e68, 0xc00075e660}, {{{0x0, 0x0}, {0xc00025c880, 0x1f}}})
/home/runner/work/namespace-configuration-operator/namespace-configuration-operator/controllers/namespaceconfig_controller.go:121 +0x83c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x27c5e68?, {0x27c5e68?, 0xc00075e660?}, {{{0x0?, 0x21c61c0?}, {0xc00025c880?, 0x27b66d0?}}})
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:118 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00027d900, {0x27c5ea0, 0xc0003f2690}, {0x233bc20?, 0xc0000626c0?})
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:314 +0x365
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00027d900, {0x27c5ea0, 0xc0003f2690})
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:265 +0x1c9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:226 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 70
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:222 +0x565

With this issue, we were receiving constant prometheus TargetDown alerts, because new operator version 1.2.5 cannot start, there is a panic, and Openshift has a generic alert called TargetDown that check that operator metrics endpoint is alive, which is not the case, impacting all clusters.

Until there is a fix on a newer version, I have mitigated the issue by now:

  • I disabled ArgoCD autosync
  • Uninstall failed operator version 1.2.5
  • Forcing old operator version 1.2.4, with Manual approval. That way, once manually accepted and installed correct 1.2.4, it won't jump automatically to failed 1.2.5 because required manual approval
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  annotations:
    argocd.argoproj.io/sync-wave: "-1"
  name: namespace-configuration-operator
spec:
  channel: alpha
  installPlanApproval: Manual
  startingCSV: namespace-configuration-operator.v1.2.4
  name: namespace-configuration-operator
  source: community-operators
  sourceNamespace: openshift-marketplace

seems to be fixed with v1.2.6
#169

/cc @slopezz @raffaelespazzoli