redhat-cop/namespace-configuration-operator

nsconfigurator stuck in upgrade

davidkarlsen opened this issue · 19 comments

it just sits there.
reporting
"install strategy completed with no errors"
but also
"one or more requirements couldn't be found"

what's not met and why? why did it upgrade in the first place if this cannot be met.

   Reason:                InstallSucceeded
    Last Transition Time:  2021-02-26T23:17:11Z
    Last Update Time:      2021-02-26T23:17:12Z
    Message:               installing: waiting for deployment namespace-configuration-operator-controller-manager to become ready: Waiting for rollout to finish: 1 old replicas are pending termination...

    Phase:                 Installing
    Reason:                InstallWaiting
    Last Transition Time:  2021-02-26T23:17:34Z
    Last Update Time:      2021-02-26T23:17:34Z
    Message:               install strategy completed with no errors
    Phase:                 Succeeded
    Reason:                InstallSucceeded
    Last Transition Time:  2021-03-02T22:50:12Z
    Last Update Time:      2021-03-02T22:50:12Z
    Message:               installing: waiting for deployment namespace-configuration-operator-controller-manager to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...

    Phase:                 Failed
    Reason:                ComponentUnhealthy
    Last Transition Time:  2021-03-02T22:50:14Z
    Last Update Time:      2021-03-02T22:50:14Z
    Message:               installing: waiting for deployment namespace-configuration-operator-controller-manager to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...

    Phase:                 Pending
    Reason:                NeedsReinstall
    Last Transition Time:  2021-03-02T22:50:15Z
    Last Update Time:      2021-03-02T22:50:15Z
    Message:               all requirements found, attempting install
    Phase:                 InstallReady
    Reason:                AllRequirementsMet
    Last Transition Time:  2021-03-02T22:50:17Z
    Last Update Time:      2021-03-02T22:50:17Z
    Message:               waiting for install components to report healthy
    Phase:                 Installing
    Reason:                InstallSucceeded
    Last Transition Time:  2021-03-02T22:50:17Z
    Last Update Time:      2021-03-02T22:50:18Z
    Message:               installing: waiting for deployment namespace-configuration-operator-controller-manager to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...

    Phase:                 Installing
    Reason:                InstallWaiting
    Last Transition Time:  2021-03-02T22:51:02Z
    Last Update Time:      2021-03-02T22:51:02Z
    Message:               install strategy completed with no errors
    Phase:                 Succeeded
    Reason:                InstallSucceeded
    Last Transition Time:  2021-03-03T18:12:23Z
    Last Update Time:      2021-03-03T18:12:23Z
    Message:               requirements no longer met
    Phase:                 Failed
    Reason:                RequirementsNotMet
    Last Transition Time:  2021-03-03T18:12:28Z
    Last Update Time:      2021-03-03T18:12:28Z
    Message:               requirements not met
    Phase:                 Pending
    Reason:                RequirementsNotMet
  Last Transition Time:    2021-03-03T18:12:28Z
  Last Update Time:        2021-03-03T18:12:28Z
  Message:                 one or more requirements couldn't be found
  Phase:                   Pending
  Reason:                  RequirementsNotMet

@davidkarlsen thank you for reaching out, could you give us some additional information about your cluster,

  1. is it vanilla k8s, OpenShift, etc?
  2. did you install through the Operator Hub console or manually CLI?
  3. Can you paste a kubectl get events within the namespace you installed the operator in and paste it here

@davidkarlsen thank you for reaching out, could you give us some additional information about your cluster,

  1. is it vanilla k8s, OpenShift, etc?

openshift v4.6.latest

  1. did you install through the Operator Hub console or manually CLI?

console. using auto-ugrade

  1. Can you paste a kubectl get events within the namespace you installed the operator in and paste it here

namespace-configuration-operator-controller-manager-6846f6gt4tl 1/1 Running 0 33h
[et2448@Davids-Work-MacBook-Pro base-ubuntu (⎈ |kube-system/api-os-global-finods-com:6443/david.karlsen@evry.com:openshift-operators)]$ k get events
LAST SEEN TYPE REASON OBJECT MESSAGE
92m Warning Unhealthy pod/namespace-configuration-operator-controller-manager-6846f6gt4tl Readiness probe failed: Get "http://10.200.10.53:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Can you share the operator logs?

On Thu, Mar 4, 2021, 3:27 AM David J. M. Karlsen @.> wrote: @davidkarlsen <@davidkarlsen> thank you for reaching out, could you give us some additional information about your cluster, 1. is it vanilla k8s, OpenShift, etc? openshift v4.6.latest 1. did you install through the Operator Hub console or manually CLI? console. using auto-ugrade 1. Can you paste a kubectl get events within the namespace you installed the operator in and paste it here namespace-configuration-operator-controller-manager-6846f6gt4tl 1/1 Running 0 33h [et2448@Davids-Work-MacBook-Pro base-ubuntu (⎈ @.:openshift-operators)]$ k get events LAST SEEN TYPE REASON OBJECT MESSAGE 92m Warning Unhealthy pod/namespace-configuration-operator-controller-manager-6846f6gt4tl Readiness probe failed: Get "http://10.200.10.53:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#92 (comment)>, or unsubscribe <github.com/notifications/unsubscribe-auth/ABPERXAU44O5BJXEIJ744OTTB5AAZANCNFSM4YSESWNQ> .

well, the running operator is the old version, so that's maybe not so interesting.

2021-03-04T12:39:27.287Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/fss-apps"}
2021-03-04T12:39:27.298Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/deployer-role"}
2021-03-04T12:42:40.922Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/deployer-role"}
2021-03-04T12:42:40.933Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/deployer-role"}
2021-03-04T12:42:40.945Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/dev-env-admins"}
2021-03-04T12:42:40.955Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/dev-env-admins"}
2021-03-04T12:42:40.967Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/fss-apps"}
2021-03-04T12:42:40.981Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/resource-quota-large"}
2021-03-04T12:42:40.990Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/fss-apps"}
2021-03-04T12:42:41.002Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/resource-quota-large"}
2021-03-04T12:42:41.012Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/resource-quota-medium"}
2021-03-04T12:42:41.020Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/resource-quota-small"}
2021-03-04T12:42:41.028Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/resource-quota-medium"}
2021-03-04T12:42:41.036Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/resource-quota-small"}
2021-03-04T12:42:41.047Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/test-env-admins"}
2021-03-04T12:42:41.059Z        INFO    controllers.NamespaceConfig     reconciling started     {"namespaceconfig": "/test-env-admins"}
[et2448@Davids-Work-MacBook-Pro applogs (⎈ |default/api-os-global-finods-com:6443/david.karlsen@evry.com:openshift-operators)]$ 

what fails is the upgrade of the operator, so that the new version isn't running

make sure the operator group contains no namespaces On Thu, Mar 4, 2021 at 7:45 AM David J. M.

Sorry I don't understand clearly, what do you mean by that?

oc get operatorgroup -A
NAMESPACE                              NAME                                    AGE
argocd                                 argocd-w7h4m                            80d
grafana-operator                       grafana-operator-gkwv2                  27d
group-sync-operator                    group-sync-operator-5jl7m               57d
oadp-operator                          oadp-operator-db69g                     19h
openshift-logging                      openshift-logging-t99zq                 6d18h
openshift-monitoring                   openshift-cluster-monitoring            90d
openshift-node-problem-detector        openshift-node-problem-detector-xszz8   80d
openshift-operator-lifecycle-manager   olm-operators                           90d
openshift-operators-redhat             openshift-operators-redhat-gvffw        6d18h
openshift-operators                    global-operators                        90d
openshift-serverless                   openshift-serverless-4kx6g              55d
[et2448@Davids-Work-MacBook-Pro tf-ecr (⎈ |default/api-os-global-finods-com:6443/david.karlsen@evry.com:openshift-operators)]$ 
oc -n openshift-operators  get operatorgroup  global-operators  -o yaml
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  annotations:
    olm.providedAPIs: AwsEventSources.v1alpha1.sources.triggermesh.com,GroupConfig.v1alpha1.redhatcop.redhat.io,Jaeger.v1.jaegertracing.io,Kiali.v1alpha1.kiali.io,MonitoringDashboard.v1alpha1.monitoring.kiali.io,NamespaceConfig.v1alpha1.redhatcop.redhat.io,ServiceMeshControlPlane.v2.maistra.io,ServiceMeshMember.v1.maistra.io,ServiceMeshMemberRoll.v1.maistra.io,UserConfig.v1alpha1.redhatcop.redhat.io
  creationTimestamp: "2020-12-04T11:22:28Z"
  generation: 2
  name: global-operators
  namespace: openshift-operators
  resourceVersion: "361844559"
  selfLink: /apis/operators.coreos.com/v1/namespaces/openshift-operators/operatorgroups/global-operators
  uid: cb5e3d9a-9bc7-49df-bfe9-e40886ffec0b
spec: {}
status:
  lastUpdated: "2020-12-04T11:26:56Z"
  namespaces:
  - ""

Tried uninstalling it and installing from scratch, I see these events:

k get events
LAST SEEN   TYPE      REASON                OBJECT                                                                      MESSAGE
29m         Normal    LeaderElection        configmap/b0b2f089.redhat.io                                                namespace-configuration-operator-controller-manager-6846f6bm25s_2ff00d95-af9c-4cd3-9638-e0c1af12d981 became leader
76s         Normal    Killing               pod/cert-utils-operator-controller-manager-79c8f8bfd8-f7x2x                 Stopping container manager
3s          Normal    Scheduled             pod/cert-utils-operator-controller-manager-79c8f8bfd8-s8btv                 Successfully assigned openshift-operators/cert-utils-operator-controller-manager-79c8f8bfd8-s8btv to alt-ksx-g-c01oco03
2s          Normal    AddedInterface        pod/cert-utils-operator-controller-manager-79c8f8bfd8-s8btv                 Add eth0 [10.200.9.180/23]
2s          Normal    Pulled                pod/cert-utils-operator-controller-manager-79c8f8bfd8-s8btv                 Container image "quay.io/redhat-cop/cert-utils-operator:v1.0.1" already present on machine
2s          Normal    Created               pod/cert-utils-operator-controller-manager-79c8f8bfd8-s8btv                 Created container manager
2s          Normal    Started               pod/cert-utils-operator-controller-manager-79c8f8bfd8-s8btv                 Started container manager
4s          Normal    SuccessfulCreate      replicaset/cert-utils-operator-controller-manager-79c8f8bfd8                Created pod: cert-utils-operator-controller-manager-79c8f8bfd8-s8btv
4s          Normal    ScalingReplicaSet     deployment/cert-utils-operator-controller-manager                           Scaled up replica set cert-utils-operator-controller-manager-79c8f8bfd8 to 1
30m         Normal    RequirementsNotMet    clusterserviceversion/cert-utils-operator.v1.0.1                            one or more requirements couldn't be found
30m         Warning   RequirementsNotMet    clusterserviceversion/cert-utils-operator.v1.0.1                            requirements no longer met
30m         Normal    RequirementsNotMet    clusterserviceversion/cert-utils-operator.v1.0.1                            requirements not met
6s          Normal    RequirementsUnknown   clusterserviceversion/cert-utils-operator.v1.0.1                            requirements not yet checked
6s          Normal    RequirementsNotMet    clusterserviceversion/cert-utils-operator.v1.0.1                            one or more requirements couldn't be found
5s          Normal    AllRequirementsMet    clusterserviceversion/cert-utils-operator.v1.0.1                            all requirements found, attempting install
4s          Normal    InstallSucceeded      clusterserviceversion/cert-utils-operator.v1.0.1                            waiting for install components to report healthy
2s          Normal    InstallWaiting        clusterserviceversion/cert-utils-operator.v1.0.1                            installing: waiting for deployment cert-utils-operator-controller-manager to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...
30m         Normal    SuccessfulCreate      replicaset/namespace-configuration-operator-controller-manager-6846f66858   Created pod: namespace-configuration-operator-controller-manager-6846f6bm25s
30m         Normal    Scheduled             pod/namespace-configuration-operator-controller-manager-6846f6bm25s         Successfully assigned openshift-operators/namespace-configuration-operator-controller-manager-6846f6bm25s to alt-ksx-g-c01oco03
30m         Normal    AddedInterface        pod/namespace-configuration-operator-controller-manager-6846f6bm25s         Add eth0 [10.200.9.166/23]
30m         Normal    Pulling               pod/namespace-configuration-operator-controller-manager-6846f6bm25s         Pulling image "quay.io/redhat-cop/namespace-configuration-operator:v1.0.1"
30m         Normal    Pulled                pod/namespace-configuration-operator-controller-manager-6846f6bm25s         Successfully pulled image "quay.io/redhat-cop/namespace-configuration-operator:v1.0.1" in 10.62747128s
30m         Normal    Created               pod/namespace-configuration-operator-controller-manager-6846f6bm25s         Created container manager
30m         Normal    Started               pod/namespace-configuration-operator-controller-manager-6846f6bm25s         Started container manager
125m        Warning   Unhealthy             pod/namespace-configuration-operator-controller-manager-6846f6gt4tl         Readiness probe failed: Get "http://10.200.10.53:8081/readyz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
56m         Warning   Unhealthy             pod/namespace-configuration-operator-controller-manager-6846f6gt4tl         Liveness probe failed: Get "http://10.200.10.53:8081/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
32m         Normal    Killing               pod/namespace-configuration-operator-controller-manager-6846f6gt4tl         Stopping container manager
30m         Normal    ScalingReplicaSet     deployment/namespace-configuration-operator-controller-manager              Scaled up replica set namespace-configuration-operator-controller-manager-6846f66858 to 1
30m         Normal    RequirementsUnknown   clusterserviceversion/namespace-configuration-operator.v1.0.1               requirements not yet checked
1s          Normal    RequirementsNotMet    clusterserviceversion/namespace-configuration-operator.v1.0.1               one or more requirements couldn't be found
30m         Normal    AllRequirementsMet    clusterserviceversion/namespace-configuration-operator.v1.0.1               all requirements found, attempting install
30m         Normal    InstallSucceeded      clusterserviceversion/namespace-configuration-operator.v1.0.1               waiting for install components to report healthy
30m         Normal    InstallWaiting        clusterserviceversion/namespace-configuration-operator.v1.0.1               installing: waiting for deployment namespace-configuration-operator-controller-manager to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...
30m         Normal    InstallSucceeded      clusterserviceversion/namespace-configuration-operator.v1.0.1               install strategy completed with no errors
2s          Warning   RequirementsNotMet    clusterserviceversion/namespace-configuration-operator.v1.0.1               requirements no longer met
1s          Normal    RequirementsNotMet    clusterserviceversion/namespace-configuration-operator.v1.0.1               requirements not met

it seems to run just fine though:

                node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason          Age   From               Message
  ----    ------          ----  ----               -------
  Normal  Scheduled       32m   default-scheduler  Successfully assigned openshift-operators/namespace-configuration-operator-controller-manager-6846f6bm25s to alt-ksx-g-c01oco03
  Normal  AddedInterface  32m   multus             Add eth0 [10.200.9.166/23]
  Normal  Pulling         32m   kubelet            Pulling image "quay.io/redhat-cop/namespace-configuration-operator:v1.0.1"
  Normal  Pulled          32m   kubelet            Successfully pulled image "quay.io/redhat-cop/namespace-configuration-operator:v1.0.1" in 10.62747128s
  Normal  Created         32m   kubelet            Created container manager
  Normal  Started         32m   kubelet            Started container manager

but in the operator console it says:
Pending
Up to date

This is maybe interesting, note the serviceaccount:

  Requirement Status:
    Group:    apiextensions.k8s.io
    Kind:     CustomResourceDefinition
    Message:  CRD is present and Established condition is true
    Name:     groupconfigs.redhatcop.redhat.io
    Status:   Present
    Uuid:     f9a67af1-feb7-4a16-ad10-accc7ad4deab
    Version:  v1
    Group:    apiextensions.k8s.io
    Kind:     CustomResourceDefinition
    Message:  CRD is present and Established condition is true
    Name:     namespaceconfigs.redhatcop.redhat.io
    Status:   Present
    Uuid:     2ffe40ca-6f67-404b-83a8-09dab580befe
    Version:  v1
    Group:    apiextensions.k8s.io
    Kind:     CustomResourceDefinition
    Message:  CRD is present and Established condition is true
    Name:     userconfigs.redhatcop.redhat.io
    Status:   Present
    Uuid:     00cdcd1c-40c5-4c0c-9ff8-7333c31b7b75
    Version:  v1
    Group:    
    Kind:     ServiceAccount
    Message:  Service account is not owned by this ClusterServiceVersion
    Name:     default
    Status:   PresentNotSatisfied
    Version:  v1
Events:
  Type     Reason               Age                  From                        Message
  ----     ------               ----                 ----                        -------
  Normal   RequirementsUnknown  34m                  operator-lifecycle-manager  requirements not yet checked
  Normal   AllRequirementsMet   34m                  operator-lifecycle-manager  all requirements found, attempting install
  Normal   InstallSucceeded     34m                  operator-lifecycle-manager  waiting for install components to report healthy
  Normal   InstallWaiting       34m (x2 over 34m)    operator-lifecycle-manager  installing: waiting for deployment namespace-configuration-operator-controller-manager to become ready: Waiting for rollout to finish: 0 of 1 updated replicas are available...
  Normal   InstallSucceeded     34m                  operator-lifecycle-manager  install strategy completed with no errors
  Warning  RequirementsNotMet   4m21s                operator-lifecycle-manager  requirements no longer met
  Normal   RequirementsNotMet   4m20s (x2 over 34m)  operator-lifecycle-manager  one or more requirements couldn't be found
  Normal   RequirementsNotMet   4m20s                operator-lifecycle-manager  requirements not met

Screenshot 2021-03-04 at 16 08 16

The namespace can't be changed at install, the namespace is fixed.

ah, but it's not a library, nvm.

may I close this?

Yeah, OLM issue so closing here. Thanks.