With v1.2.5 failed to start - lib64/libc.so.6: version `GLIBC_2.34' not found
1337andre opened this issue · 3 comments
Hi folks,
with image quay.io/redhat-cop/namespace-configuration-operator:v1.2.5
namespace-configuration-operator failed to start
lib64/libc.so.6: version `GLIBC_2.34' not found
#kubectl describe pod namespace-configuration-operator-67dc8fdbbb-z7xwd
Name: namespace-configuration-operator-67dc8fdbbb-z7xwd
Namespace: default
Priority: 1
Priority Class Name: low
Service Account: controller-manager
Node: rxcmpk8s19.hcp-infra.blub.de/172.31.102.101
Start Time: Tue, 14 Nov 2023 08:26:13 +0100
Labels: app.kubernetes.io/instance=namespace-configuration-operator
app.kubernetes.io/name=namespace-configuration-operator
control-plane=namespace-configuration-operator
pod-template-hash=67dc8fdbbb
Annotations: cni.projectcalico.org/containerID: 24d87fc3831cd061e868876406b8b5cb1c393f09a49b7a21ec17f6012596ab9c
cni.projectcalico.org/podIP: 10.4.192.202/32
cni.projectcalico.org/podIPs: 10.4.192.202/32
kubectl.kubernetes.io/restartedAt: 2023-09-01T11:43:34+02:00
Status: Running
IP: 10.4.192.202
IPs:
IP: 10.4.192.202
Controlled By: ReplicaSet/namespace-configuration-operator-67dc8fdbbb
Containers:
kube-rbac-proxy:
Container ID: containerd://7c5362a8759c282136bc91d150d392d3c07dad903a7645a3d2feafcf80c76cb0
Image: quay.io/redhat-cop/kube-rbac-proxy:v0.11.0
Image ID: quay.io/redhat-cop/kube-rbac-proxy@sha256:c68135620167c41e3d9f6c1d2ca1eb8fa24312b86186d09b8010656b9d25fb47
Port: 8443/TCP
Host Port: 0/TCP
Args:
--secure-listen-address=0.0.0.0:8443
--upstream=http://127.0.0.1:8080/
--logtostderr=true
--tls-cert-file=/etc/certs/tls/tls.crt
--tls-private-key-file=/etc/certs/tls/tls.key
--v=10
State: Running
Started: Tue, 14 Nov 2023 08:26:15 +0100
Ready: True
Restart Count: 0
Requests:
cpu: 100m
memory: 20Mi
Environment: <none>
Mounts:
/etc/certs/tls from namespace-configuration-operator-certs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ln4t4 (ro)
namespace-configuration-operator:
Container ID: containerd://7773f9892338e07a8f25b4fd2b2c1907598555e5fa442085716ab02f3e12a18c
Image: quay.io/redhat-cop/namespace-configuration-operator:v1.2.5
Image ID: quay.io/redhat-cop/namespace-configuration-operator@sha256:20debaa7b91aebf034a0bd2baa80d37bac2b23be5c76efc2a6edbc14f942b2b1
Port: <none>
Host Port: <none>
Command:
/manager
Args:
--leader-elect
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 14 Nov 2023 08:37:17 +0100
Finished: Tue, 14 Nov 2023 08:37:17 +0100
Ready: False
Restart Count: 7
Requests:
cpu: 100m
memory: 20Mi
Liveness: http-get http://:8081/healthz delay=15s timeout=1s period=20s #success=1 #failure=3
Readiness: http-get http://:8081/readyz delay=5s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/tmp/k8s-webhook-server/serving-certs from webhook-server-cert (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-ln4t4 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
namespace-configuration-operator-certs:
Type: Secret (a volume populated by a Secret)
SecretName: namespace-configuration-operator-certs
Optional: false
webhook-server-cert:
Type: Secret (a volume populated by a Secret)
SecretName: webhook-server-cert
Optional: false
kube-api-access-ln4t4:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned default/namespace-configuration-operator-67dc8fdbbb-z7xwd to rxcmpk8s19.hcp-infra.blub.de
Normal Pulled 14m kubelet Container image "quay.io/redhat-cop/kube-rbac-proxy:v0.11.0" already present on machine
Normal Created 14m kubelet Created container kube-rbac-proxy
Normal Started 14m kubelet Started container kube-rbac-proxy
Normal Pulling 14m kubelet Pulling image "quay.io/redhat-cop/namespace-configuration-operator:v1.2.5"
Normal Pulled 14m kubelet Successfully pulled image "quay.io/redhat-cop/namespace-configuration-operator:v1.2.5" in 6.834695195s (6.834708313s including waiting)
Normal Created 13m (x4 over 14m) kubelet Created container namespace-configuration-operator
Normal Started 13m (x4 over 14m) kubelet Started container namespace-configuration-operator
Normal Pulled 13m (x3 over 14m) kubelet Container image "quay.io/redhat-cop/namespace-configuration-operator:v1.2.5" already present on machine
Warning BackOff 4m27s (x52 over 14m) kubelet Back-off restarting failed container namespace-configuration-operator in pod namespace-configuration-operator-67dc8fdbbb-z7xwd_default(4d535843-f87d-4e36-86d8-669881710c75)
Failed version 1.2.5
has just been published into operatorHub community-operators
redhat-openshift-ecosystem/community-operators-prod#3593
And so the error is starting to arrive to anyone using the operator through operatorHub:
2023-11-22T09:54:58Z INFO Starting workers {"controller": "namespaceconfig", "controllerGroup": "redhatcop.redhat.io", "controllerKind": "NamespaceConfig", "worker count": 1}
2023-11-22T09:54:58Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": {"name":"basic-user-namespace-monitoring"}}
2023-11-22T09:54:59Z INFO Observed a panic in reconciler: interface conversion: validation.Schema is *validation.schemaValidation, not *validation.NullSchema {"controller": "namespaceconfig", "controllerGroup": "redhatcop.redhat.io", "controllerKind": "NamespaceConfig", "NamespaceConfig": {"name":"basic-user-namespace-monitoring"}, "namespace": "", "name": "basic-user-namespace-monitoring", "reconcileID": "d1c4208b-33eb-444a-941c-d2af11af0852"}
panic: interface conversion: validation.Schema is *validation.schemaValidation, not *validation.NullSchema [recovered]
panic: interface conversion: validation.Schema is *validation.schemaValidation, not *validation.NullSchema
goroutine 209 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:115 +0x1e5
panic({0x22be620?, 0xc0041aa660?})
/opt/hostedtoolcache/go/1.21.4/x64/src/runtime/panic.go:914 +0x21f
github.com/redhat-cop/operator-utils/pkg/util/lockedresourcecontroller.(*LockedResourceManager).validateLockedResources(0xc000854380, {0xc000990800, 0x34, 0xc000637a60?})
/home/runner/go/pkg/mod/github.com/redhat-cop/operator-utils@v1.3.7/pkg/util/lockedresourcecontroller/locked-resource-manager.go:341 +0x6f3
github.com/redhat-cop/operator-utils/pkg/util/lockedresourcecontroller.(*LockedResourceManager).SetResources(0xc000854380, {0xc000990800?, 0x34, 0x40})
/home/runner/go/pkg/mod/github.com/redhat-cop/operator-utils@v1.3.7/pkg/util/lockedresourcecontroller/locked-resource-manager.go:82 +0x77
github.com/redhat-cop/operator-utils/pkg/util/lockedresourcecontroller.(*LockedResourceManager).Restart(0xc000854380, {0x27c5e68, 0xc00075e660}, {0xc000990800, 0x34, 0x40}, {0x3677360, 0x0, 0x0}, 0x0, ...)
/home/runner/go/pkg/mod/github.com/redhat-cop/operator-utils@v1.3.7/pkg/util/lockedresourcecontroller/locked-resource-manager.go:222 +0x16c
github.com/redhat-cop/operator-utils/pkg/util/lockedresourcecontroller.(*EnforcingReconciler).UpdateLockedResourcesWithRestConfig(0xc0000f0f20, {0x27c5e68, 0xc00075e660}, {0x27db750?, 0xc00061a1a0?}, {0xc000990800, 0x34, 0x40}, {0x3677360, 0x0, ...}, ...)
/home/runner/go/pkg/mod/github.com/redhat-cop/operator-utils@v1.3.7/pkg/util/lockedresourcecontroller/enforcing-reconciler.go:117 +0x3bc
github.com/redhat-cop/operator-utils/pkg/util/lockedresourcecontroller.(*EnforcingReconciler).UpdateLockedResources(...)
/home/runner/go/pkg/mod/github.com/redhat-cop/operator-utils@v1.3.7/pkg/util/lockedresourcecontroller/enforcing-reconciler.go:91
github.com/redhat-cop/namespace-configuration-operator/controllers.(*NamespaceConfigReconciler).Reconcile(0xc0000f0f20, {0x27c5e68, 0xc00075e660}, {{{0x0, 0x0}, {0xc00025c880, 0x1f}}})
/home/runner/work/namespace-configuration-operator/namespace-configuration-operator/controllers/namespaceconfig_controller.go:121 +0x83c
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x27c5e68?, {0x27c5e68?, 0xc00075e660?}, {{{0x0?, 0x21c61c0?}, {0xc00025c880?, 0x27b66d0?}}})
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:118 +0xb7
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc00027d900, {0x27c5ea0, 0xc0003f2690}, {0x233bc20?, 0xc0000626c0?})
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:314 +0x365
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc00027d900, {0x27c5ea0, 0xc0003f2690})
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:265 +0x1c9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:226 +0x79
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 70
/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.2/pkg/internal/controller/controller.go:222 +0x565
With this issue, we were receiving constant prometheus TargetDown
alerts, because new operator version 1.2.5
cannot start, there is a panic, and Openshift has a generic alert called TargetDown
that check that operator metrics endpoint is alive, which is not the case, impacting all clusters.
Until there is a fix on a newer version, I have mitigated the issue by now:
- I disabled ArgoCD autosync
- Uninstall failed operator version
1.2.5
- Forcing old operator version
1.2.4
, withManual
approval. That way, once manually accepted and installed correct1.2.4
, it won't jump automatically to failed1.2.5
because required manual approval
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
annotations:
argocd.argoproj.io/sync-wave: "-1"
name: namespace-configuration-operator
spec:
channel: alpha
installPlanApproval: Manual
startingCSV: namespace-configuration-operator.v1.2.4
name: namespace-configuration-operator
source: community-operators
sourceNamespace: openshift-marketplace