nsconfigurator stuck in upgrade
davidkarlsen opened this issue · 19 comments
it just sits there.
reporting
"install strategy completed with no errors"
but also
"one or more requirements couldn't be found"
what's not met and why? why did it upgrade in the first place if this cannot be met.
Reason: InstallSucceeded
Last Transition Time: 2021-02-26T23:17:11Z
Last Update Time: 2021-02-26T23:17:12Z
Message: installing: waiting for deployment namespace-configuration-operator-controller-manager to become ready: Waiting for rollout to finish: 1 old replicas are pending termination...
Phase: Installing
Reason: InstallWaiting
Last Transition Time: 2021-02-26T23:17:34Z
Last Update Time: 2021-02-26T23:17:34Z
Message: install strategy completed with no errors
Phase: Succeeded
Reason: InstallSucceeded
Last Transition Time: 2021-03-02T22:50:12Z
Last Update Time: 2021-03-02T22:50:12Z
Message: installing: waiting for deployment namespace-configuration-operator-controller-manager to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...
Phase: Failed
Reason: ComponentUnhealthy
Last Transition Time: 2021-03-02T22:50:14Z
Last Update Time: 2021-03-02T22:50:14Z
Message: installing: waiting for deployment namespace-configuration-operator-controller-manager to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...
Phase: Pending
Reason: NeedsReinstall
Last Transition Time: 2021-03-02T22:50:15Z
Last Update Time: 2021-03-02T22:50:15Z
Message: all requirements found, attempting install
Phase: InstallReady
Reason: AllRequirementsMet
Last Transition Time: 2021-03-02T22:50:17Z
Last Update Time: 2021-03-02T22:50:17Z
Message: waiting for install components to report healthy
Phase: Installing
Reason: InstallSucceeded
Last Transition Time: 2021-03-02T22:50:17Z
Last Update Time: 2021-03-02T22:50:18Z
Message: installing: waiting for deployment namespace-configuration-operator-controller-manager to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...
Phase: Installing
Reason: InstallWaiting
Last Transition Time: 2021-03-02T22:51:02Z
Last Update Time: 2021-03-02T22:51:02Z
Message: install strategy completed with no errors
Phase: Succeeded
Reason: InstallSucceeded
Last Transition Time: 2021-03-03T18:12:23Z
Last Update Time: 2021-03-03T18:12:23Z
Message: requirements no longer met
Phase: Failed
Reason: RequirementsNotMet
Last Transition Time: 2021-03-03T18:12:28Z
Last Update Time: 2021-03-03T18:12:28Z
Message: requirements not met
Phase: Pending
Reason: RequirementsNotMet
Last Transition Time: 2021-03-03T18:12:28Z
Last Update Time: 2021-03-03T18:12:28Z
Message: one or more requirements couldn't be found
Phase: Pending
Reason: RequirementsNotMet
@davidkarlsen thank you for reaching out, could you give us some additional information about your cluster,
- is it vanilla k8s, OpenShift, etc?
- did you install through the Operator Hub console or manually CLI?
- Can you paste a
kubectl get events
within the namespace you installed the operator in and paste it here
@davidkarlsen thank you for reaching out, could you give us some additional information about your cluster,
- is it vanilla k8s, OpenShift, etc?
openshift v4.6.latest
- did you install through the Operator Hub console or manually CLI?
console. using auto-ugrade
- Can you paste a
kubectl get events
within the namespace you installed the operator in and paste it here
namespace-configuration-operator-controller-manager-6846f6gt4tl 1/1 Running 0 33h
[et2448@Davids-Work-MacBook-Pro base-ubuntu (⎈ |kube-system/api-os-global-finods-com:6443/david.karlsen@evry.com:openshift-operators)]$ k get events
LAST SEEN TYPE REASON OBJECT MESSAGE
92m Warning Unhealthy pod/namespace-configuration-operator-controller-manager-6846f6gt4tl Readiness probe failed: Get "http://10.200.10.53:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Can you share the operator logs?
…
On Thu, Mar 4, 2021, 3:27 AM David J. M. Karlsen @.> wrote: @davidkarlsen <@davidkarlsen> thank you for reaching out, could you give us some additional information about your cluster, 1. is it vanilla k8s, OpenShift, etc? openshift v4.6.latest 1. did you install through the Operator Hub console or manually CLI? console. using auto-ugrade 1. Can you paste a kubectl get events within the namespace you installed the operator in and paste it here namespace-configuration-operator-controller-manager-6846f6gt4tl 1/1 Running 0 33h [et2448@Davids-Work-MacBook-Pro base-ubuntu (⎈ @.:openshift-operators)]$ k get events LAST SEEN TYPE REASON OBJECT MESSAGE 92m Warning Unhealthy pod/namespace-configuration-operator-controller-manager-6846f6gt4tl Readiness probe failed: Get "http://10.200.10.53:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#92 (comment)>, or unsubscribe <github.com/notifications/unsubscribe-auth/ABPERXAU44O5BJXEIJ744OTTB5AAZANCNFSM4YSESWNQ> .
well, the running operator is the old version, so that's maybe not so interesting.
2021-03-04T12:39:27.287Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/fss-apps"}
2021-03-04T12:39:27.298Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/deployer-role"}
2021-03-04T12:42:40.922Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/deployer-role"}
2021-03-04T12:42:40.933Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/deployer-role"}
2021-03-04T12:42:40.945Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/dev-env-admins"}
2021-03-04T12:42:40.955Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/dev-env-admins"}
2021-03-04T12:42:40.967Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/fss-apps"}
2021-03-04T12:42:40.981Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/resource-quota-large"}
2021-03-04T12:42:40.990Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/fss-apps"}
2021-03-04T12:42:41.002Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/resource-quota-large"}
2021-03-04T12:42:41.012Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/resource-quota-medium"}
2021-03-04T12:42:41.020Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/resource-quota-small"}
2021-03-04T12:42:41.028Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/resource-quota-medium"}
2021-03-04T12:42:41.036Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/resource-quota-small"}
2021-03-04T12:42:41.047Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/test-env-admins"}
2021-03-04T12:42:41.059Z INFO controllers.NamespaceConfig reconciling started {"namespaceconfig": "/test-env-admins"}
[et2448@Davids-Work-MacBook-Pro applogs (⎈ |default/api-os-global-finods-com:6443/david.karlsen@evry.com:openshift-operators)]$
what fails is the upgrade of the operator, so that the new version isn't running
make sure the operator group contains no namespaces On Thu, Mar 4, 2021 at 7:45 AM David J. M.
Sorry I don't understand clearly, what do you mean by that?
oc get operatorgroup -A
NAMESPACE NAME AGE
argocd argocd-w7h4m 80d
grafana-operator grafana-operator-gkwv2 27d
group-sync-operator group-sync-operator-5jl7m 57d
oadp-operator oadp-operator-db69g 19h
openshift-logging openshift-logging-t99zq 6d18h
openshift-monitoring openshift-cluster-monitoring 90d
openshift-node-problem-detector openshift-node-problem-detector-xszz8 80d
openshift-operator-lifecycle-manager olm-operators 90d
openshift-operators-redhat openshift-operators-redhat-gvffw 6d18h
openshift-operators global-operators 90d
openshift-serverless openshift-serverless-4kx6g 55d
[et2448@Davids-Work-MacBook-Pro tf-ecr (⎈ |default/api-os-global-finods-com:6443/david.karlsen@evry.com:openshift-operators)]$
oc -n openshift-operators get operatorgroup global-operators -o yaml
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
annotations:
olm.providedAPIs: AwsEventSources.v1alpha1.sources.triggermesh.com,GroupConfig.v1alpha1.redhatcop.redhat.io,Jaeger.v1.jaegertracing.io,Kiali.v1alpha1.kiali.io,MonitoringDashboard.v1alpha1.monitoring.kiali.io,NamespaceConfig.v1alpha1.redhatcop.redhat.io,ServiceMeshControlPlane.v2.maistra.io,ServiceMeshMember.v1.maistra.io,ServiceMeshMemberRoll.v1.maistra.io,UserConfig.v1alpha1.redhatcop.redhat.io
creationTimestamp: "2020-12-04T11:22:28Z"
generation: 2
name: global-operators
namespace: openshift-operators
resourceVersion: "361844559"
selfLink: /apis/operators.coreos.com/v1/namespaces/openshift-operators/operatorgroups/global-operators
uid: cb5e3d9a-9bc7-49df-bfe9-e40886ffec0b
spec: {}
status:
lastUpdated: "2020-12-04T11:26:56Z"
namespaces:
- ""
Tried uninstalling it and installing from scratch, I see these events:
k get events
LAST SEEN TYPE REASON OBJECT MESSAGE
29m Normal LeaderElection configmap/b0b2f089.redhat.io namespace-configuration-operator-controller-manager-6846f6bm25s_2ff00d95-af9c-4cd3-9638-e0c1af12d981 became leader
76s Normal Killing pod/cert-utils-operator-controller-manager-79c8f8bfd8-f7x2x Stopping container manager
3s Normal Scheduled pod/cert-utils-operator-controller-manager-79c8f8bfd8-s8btv Successfully assigned openshift-operators/cert-utils-operator-controller-manager-79c8f8bfd8-s8btv to alt-ksx-g-c01oco03
2s Normal AddedInterface pod/cert-utils-operator-controller-manager-79c8f8bfd8-s8btv Add eth0 [10.200.9.180/23]
2s Normal Pulled pod/cert-utils-operator-controller-manager-79c8f8bfd8-s8btv Container image "quay.io/redhat-cop/cert-utils-operator:v1.0.1" already present on machine
2s Normal Created pod/cert-utils-operator-controller-manager-79c8f8bfd8-s8btv Created container manager
2s Normal Started pod/cert-utils-operator-controller-manager-79c8f8bfd8-s8btv Started container manager
4s Normal SuccessfulCreate replicaset/cert-utils-operator-controller-manager-79c8f8bfd8 Created pod: cert-utils-operator-controller-manager-79c8f8bfd8-s8btv
4s Normal ScalingReplicaSet deployment/cert-utils-operator-controller-manager Scaled up replica set cert-utils-operator-controller-manager-79c8f8bfd8 to 1
30m Normal RequirementsNotMet clusterserviceversion/cert-utils-operator.v1.0.1 one or more requirements couldn't be found
30m Warning RequirementsNotMet clusterserviceversion/cert-utils-operator.v1.0.1 requirements no longer met
30m Normal RequirementsNotMet clusterserviceversion/cert-utils-operator.v1.0.1 requirements not met
6s Normal RequirementsUnknown clusterserviceversion/cert-utils-operator.v1.0.1 requirements not yet checked
6s Normal RequirementsNotMet clusterserviceversion/cert-utils-operator.v1.0.1 one or more requirements couldn't be found
5s Normal AllRequirementsMet clusterserviceversion/cert-utils-operator.v1.0.1 all requirements found, attempting install
4s Normal InstallSucceeded clusterserviceversion/cert-utils-operator.v1.0.1 waiting for install components to report healthy
2s Normal InstallWaiting clusterserviceversion/cert-utils-operator.v1.0.1 installing: waiting for deployment cert-utils-operator-controller-manager to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...
30m Normal SuccessfulCreate replicaset/namespace-configuration-operator-controller-manager-6846f66858 Created pod: namespace-configuration-operator-controller-manager-6846f6bm25s
30m Normal Scheduled pod/namespace-configuration-operator-controller-manager-6846f6bm25s Successfully assigned openshift-operators/namespace-configuration-operator-controller-manager-6846f6bm25s to alt-ksx-g-c01oco03
30m Normal AddedInterface pod/namespace-configuration-operator-controller-manager-6846f6bm25s Add eth0 [10.200.9.166/23]
30m Normal Pulling pod/namespace-configuration-operator-controller-manager-6846f6bm25s Pulling image "quay.io/redhat-cop/namespace-configuration-operator:v1.0.1"
30m Normal Pulled pod/namespace-configuration-operator-controller-manager-6846f6bm25s Successfully pulled image "quay.io/redhat-cop/namespace-configuration-operator:v1.0.1" in 10.62747128s
30m Normal Created pod/namespace-configuration-operator-controller-manager-6846f6bm25s Created container manager
30m Normal Started pod/namespace-configuration-operator-controller-manager-6846f6bm25s Started container manager
125m Warning Unhealthy pod/namespace-configuration-operator-controller-manager-6846f6gt4tl Readiness probe failed: Get "http://10.200.10.53:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
56m Warning Unhealthy pod/namespace-configuration-operator-controller-manager-6846f6gt4tl Liveness probe failed: Get "http://10.200.10.53:8081/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
32m Normal Killing pod/namespace-configuration-operator-controller-manager-6846f6gt4tl Stopping container manager
30m Normal ScalingReplicaSet deployment/namespace-configuration-operator-controller-manager Scaled up replica set namespace-configuration-operator-controller-manager-6846f66858 to 1
30m Normal RequirementsUnknown clusterserviceversion/namespace-configuration-operator.v1.0.1 requirements not yet checked
1s Normal RequirementsNotMet clusterserviceversion/namespace-configuration-operator.v1.0.1 one or more requirements couldn't be found
30m Normal AllRequirementsMet clusterserviceversion/namespace-configuration-operator.v1.0.1 all requirements found, attempting install
30m Normal InstallSucceeded clusterserviceversion/namespace-configuration-operator.v1.0.1 waiting for install components to report healthy
30m Normal InstallWaiting clusterserviceversion/namespace-configuration-operator.v1.0.1 installing: waiting for deployment namespace-configuration-operator-controller-manager to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...
30m Normal InstallSucceeded clusterserviceversion/namespace-configuration-operator.v1.0.1 install strategy completed with no errors
2s Warning RequirementsNotMet clusterserviceversion/namespace-configuration-operator.v1.0.1 requirements no longer met
1s Normal RequirementsNotMet clusterserviceversion/namespace-configuration-operator.v1.0.1 requirements not met
it seems to run just fine though:
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 32m default-scheduler Successfully assigned openshift-operators/namespace-configuration-operator-controller-manager-6846f6bm25s to alt-ksx-g-c01oco03
Normal AddedInterface 32m multus Add eth0 [10.200.9.166/23]
Normal Pulling 32m kubelet Pulling image "quay.io/redhat-cop/namespace-configuration-operator:v1.0.1"
Normal Pulled 32m kubelet Successfully pulled image "quay.io/redhat-cop/namespace-configuration-operator:v1.0.1" in 10.62747128s
Normal Created 32m kubelet Created container manager
Normal Started 32m kubelet Started container manager
but in the operator console it says:
Pending
Up to date
This is maybe interesting, note the serviceaccount:
Requirement Status:
Group: apiextensions.k8s.io
Kind: CustomResourceDefinition
Message: CRD is present and Established condition is true
Name: groupconfigs.redhatcop.redhat.io
Status: Present
Uuid: f9a67af1-feb7-4a16-ad10-accc7ad4deab
Version: v1
Group: apiextensions.k8s.io
Kind: CustomResourceDefinition
Message: CRD is present and Established condition is true
Name: namespaceconfigs.redhatcop.redhat.io
Status: Present
Uuid: 2ffe40ca-6f67-404b-83a8-09dab580befe
Version: v1
Group: apiextensions.k8s.io
Kind: CustomResourceDefinition
Message: CRD is present and Established condition is true
Name: userconfigs.redhatcop.redhat.io
Status: Present
Uuid: 00cdcd1c-40c5-4c0c-9ff8-7333c31b7b75
Version: v1
Group:
Kind: ServiceAccount
Message: Service account is not owned by this ClusterServiceVersion
Name: default
Status: PresentNotSatisfied
Version: v1
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal RequirementsUnknown 34m operator-lifecycle-manager requirements not yet checked
Normal AllRequirementsMet 34m operator-lifecycle-manager all requirements found, attempting install
Normal InstallSucceeded 34m operator-lifecycle-manager waiting for install components to report healthy
Normal InstallWaiting 34m (x2 over 34m) operator-lifecycle-manager installing: waiting for deployment namespace-configuration-operator-controller-manager to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...
Normal InstallSucceeded 34m operator-lifecycle-manager install strategy completed with no errors
Warning RequirementsNotMet 4m21s operator-lifecycle-manager requirements no longer met
Normal RequirementsNotMet 4m20s (x2 over 34m) operator-lifecycle-manager one or more requirements couldn't be found
Normal RequirementsNotMet 4m20s operator-lifecycle-manager requirements not met
@raffaelespazzoli its fixed upstream now: operator-framework/operator-lifecycle-manager#2028
ah, but it's not a library, nvm.
may I close this?
Yeah, OLM issue so closing here. Thanks.