openshift/machine-config-operator

Error while reconciling 4.10.0-0.okd-2022-03-07-131213: the cluster operator machine-config is degraded

skywidetech opened this issue · 4 comments

Description

After installing a fresh OKD 4.10.0-0.okd-2022-03-07-131213 with bare metal on VMware, the operator machine-config is dead with the following error in the console.
"Error while reconciling 4.10.0-0.okd-2022-03-07-131213: the cluster operator machine-config is degraded"

Could anybody give me a hint how to solve it?

Output of oc -n openshift-machine-config-operator logs deploy/machine-config-controller:

[root@mgr .kube]# oc -n openshift-machine-config-operator logs deploy/machine-config-controller
I0712 10:36:33.169938       1 start.go:50] Version: v4.10.0-202203040217.p0.g14a1ca2.assembly.stream-dirty (14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22)
I0712 10:36:34.192318       1 leaderelection.go:248] attempting to acquire leader lease openshift-machine-config-operator/machine-config-controller...
I0712 10:36:34.197532       1 leaderelection.go:258] successfully acquired lease openshift-machine-config-operator/machine-config-controller
E0712 10:36:34.214908       1 render_controller.go:185] error finding pools for machineconfig: could not find any MachineConfigPool set for MachineConfig 99-worker-okd-extensions with labels: map[machineconfiguration.openshift.io/role:worker]
E0712 10:36:34.215001       1 render_controller.go:185] error finding pools for machineconfig: could not find any MachineConfigPool set for MachineConfig 99-master-ssh with labels: map[machineconfiguration.openshift.io/role:master]
E0712 10:36:34.215036       1 render_controller.go:185] error finding pools for machineconfig: could not find any MachineConfigPool set for MachineConfig 99-master-okd-extensions with labels: map[machineconfiguration.openshift.io/role:master]
E0712 10:36:34.215070       1 render_controller.go:185] error finding pools for machineconfig: could not find any MachineConfigPool set for MachineConfig 99-worker-ssh with labels: map[machineconfiguration.openshift.io/role:worker]
E0712 10:36:34.215098       1 render_controller.go:185] error finding pools for machineconfig: could not find any MachineConfigPool set for MachineConfig 99-okd-master-disable-mitigations with labels: map[machineconfiguration.openshift.io/role:master]
E0712 10:36:34.215121       1 render_controller.go:185] error finding pools for machineconfig: could not find any MachineConfigPool set for MachineConfig 99-okd-worker-disable-mitigations with labels: map[machineconfiguration.openshift.io/role:worker]
I0712 10:36:34.216720       1 template_controller.go:137] Re-syncing ControllerConfig due to secret pull-secret change
I0712 10:36:34.250308       1 node_controller.go:419] Pool master: node master-0: changed labels
I0712 10:36:34.250323       1 node_controller.go:419] Pool master: node master-0: changed taints
I0712 10:36:34.307589       1 node_controller.go:152] Starting MachineConfigController-NodeController
I0712 10:36:34.307621       1 kubelet_config_controller.go:169] Starting MachineConfigController-KubeletConfigController
I0712 10:36:34.307645       1 container_runtime_config_controller.go:184] Starting MachineConfigController-ContainerRuntimeConfigController
I0712 10:36:34.307660       1 render_controller.go:124] Starting MachineConfigController-RenderController
I0712 10:36:34.307712       1 template_controller.go:238] Starting MachineConfigController-TemplateController
I0712 10:36:34.439405       1 container_runtime_config_controller.go:802] Applied ImageConfig cluster on MachineConfigPool master
I0712 10:36:34.543539       1 container_runtime_config_controller.go:802] Applied ImageConfig cluster on MachineConfigPool worker
I0712 10:36:39.216137       1 node_controller.go:723] Pool worker is unconfigured, pausing 5s for renderer to initialize
I0712 10:36:39.216290       1 node_controller.go:723] Pool master is unconfigured, pausing 5s for renderer to initialize
I0712 10:36:39.270148       1 render_controller.go:501] Generated machineconfig rendered-master-4b564212432dc4da216f588f03512603 from 7 configs: [{MachineConfig  00-master  machineconfiguration.openshift.io/v1  } {MachineConfig  01-master-container-runtime  machineconfiguration.openshift.io/v1  } {MachineConfig  01-master-kubelet  machineconfiguration.openshift.io/v1  } {MachineConfig  99-master-generated-registries  machineconfiguration.openshift.io/v1  } {MachineConfig  99-master-okd-extensions  machineconfiguration.openshift.io/v1  } {MachineConfig  99-master-ssh  machineconfiguration.openshift.io/v1  } {MachineConfig  99-okd-master-disable-mitigations  machineconfiguration.openshift.io/v1  }]
I0712 10:36:39.275148       1 event.go:285] Event(v1.ObjectReference{Kind:"MachineConfigPool", Namespace:"", Name:"master", UID:"8995dfde-b90b-4875-8577-a91b90b3a8aa", APIVersion:"machineconfiguration.openshift.io/v1", ResourceVersion:"4029", FieldPath:""}): type: 'Normal' reason: 'RenderedConfigGenerated' rendered-master-4b564212432dc4da216f588f03512603 successfully generated (release version: 4.10.0-0.okd-2022-03-07-131213, controller version: 14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22)
I0712 10:36:39.279142       1 render_controller.go:527] Pool master: now targeting: rendered-master-4b564212432dc4da216f588f03512603
I0712 10:36:39.285249       1 render_controller.go:377] Error syncing machineconfigpool master: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "master": the object has been modified; please apply your changes to the latest version and try again
I0712 10:36:39.293819       1 render_controller.go:501] Generated machineconfig rendered-worker-852da369a901f1a2ea5020c564aaa1de from 7 configs: [{MachineConfig  00-worker  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-container-runtime  machineconfiguration.openshift.io/v1  } {MachineConfig  01-worker-kubelet  machineconfiguration.openshift.io/v1  } {MachineConfig  99-okd-worker-disable-mitigations  machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-generated-registries  machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-okd-extensions  machineconfiguration.openshift.io/v1  } {MachineConfig  99-worker-ssh  machineconfiguration.openshift.io/v1  }]
I0712 10:36:39.294355       1 event.go:285] Event(v1.ObjectReference{Kind:"MachineConfigPool", Namespace:"", Name:"worker", UID:"29248953-aad9-4fbc-b399-eb0717b9edaf", APIVersion:"machineconfiguration.openshift.io/v1", ResourceVersion:"4030", FieldPath:""}): type: 'Normal' reason: 'RenderedConfigGenerated' rendered-worker-852da369a901f1a2ea5020c564aaa1de successfully generated (release version: 4.10.0-0.okd-2022-03-07-131213, controller version: 14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22)
I0712 10:36:39.300698       1 render_controller.go:527] Pool worker: now targeting: rendered-worker-852da369a901f1a2ea5020c564aaa1de
I0712 10:36:39.307056       1 render_controller.go:377] Error syncing machineconfigpool worker: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "worker": the object has been modified; please apply your changes to the latest version and try again
I0712 10:36:44.310816       1 status.go:90] Pool worker: All nodes are updated with rendered-worker-852da369a901f1a2ea5020c564aaa1de
I0712 10:36:44.320696       1 node_controller.go:840] Updated controlPlaneTopology annotation of node master-0 from  to 
I0712 10:36:44.363718       1 node_controller.go:419] Pool master: node master-0: changed taints
I0712 10:37:56.330429       1 template_controller.go:137] Re-syncing ControllerConfig due to secret pull-secret change
I0712 10:40:23.783837       1 template_controller.go:137] Re-syncing ControllerConfig due to secret pull-secret change
I0712 10:44:01.856431       1 template_controller.go:137] Re-syncing ControllerConfig due to secret pull-secret change
I0712 11:10:14.017753       1 template_controller.go:137] Re-syncing ControllerConfig due to secret pull-secret change
I0712 11:36:26.179845       1 template_controller.go:137] Re-syncing ControllerConfig due to secret pull-secret change

Output of oc get -o yaml clusteroperator machine-config:

[root@mgr .kube]# oc get -o yaml clusteroperator machine-config
apiVersion: config.openshift.io/v1
kind: ClusterOperator
metadata:
  annotations:
    exclude.release.openshift.io/internal-openshift-hosted: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
  creationTimestamp: "2022-07-12T10:29:03Z"
  generation: 1
  name: machine-config
  ownerReferences:
  - apiVersion: config.openshift.io/v1
    kind: ClusterVersion
    name: version
    uid: ebec5c6b-c137-4a9e-b165-5e29b0555f04
  resourceVersion: "40639"
  uid: 5c1f6763-e5d3-4c57-9287-92c43d933ea7
spec: {}
status:
  conditions:
  - lastTransitionTime: "2022-07-12T10:33:55Z"
    message: Working towards 4.10.0-0.okd-2022-03-07-131213
    status: "True"
    type: Progressing
  - lastTransitionTime: "2022-07-12T10:36:44Z"
    message: One or more machine config pools are degraded, please see `oc get mcp`
      for further details and resolve before upgrading
    reason: DegradedPool
    status: "False"
    type: Upgradeable
  - lastTransitionTime: "2022-07-12T10:46:38Z"
    message: 'Unable to apply 4.10.0-0.okd-2022-03-07-131213: timed out waiting for
      the condition during syncRequiredMachineConfigPools: error pool master is not
      ready, retrying. Status: (pool degraded: true total: 1, ready 0, updated: 0,
      unavailable: 1)'
    reason: RequiredPoolsFailed
    status: "True"
    type: Degraded
  - lastTransitionTime: "2022-07-12T10:46:38Z"
    message: Cluster has deployed []
    status: "True"
    type: Available
  extension:
    master: 'pool is degraded because nodes fail with "1 nodes are reporting degraded
      status on sync": "Node master-0 is reporting: \"machineconfig.machineconfiguration.openshift.io
      \\\"rendered-master-92deda58e6066601431e8302df5c9278\\\" not found\""'
    worker: all 0 nodes are at latest configuration rendered-worker-852da369a901f1a2ea5020c564aaa1de
  relatedObjects:
  - group: ""
    name: openshift-machine-config-operator
    resource: namespaces
  - group: machineconfiguration.openshift.io
    name: ""
    resource: machineconfigpools
  - group: machineconfiguration.openshift.io
    name: ""
    resource: controllerconfigs
  - group: machineconfiguration.openshift.io
    name: ""
    resource: kubeletconfigs
  - group: machineconfiguration.openshift.io
    name: ""
    resource: containerruntimeconfigs
  - group: machineconfiguration.openshift.io
    name: ""
    resource: machineconfigs
  - group: ""
    name: ""
    resource: nodes
  - group: ""
    name: openshift-kni-infra
    resource: namespaces
  - group: ""
    name: openshift-openstack-infra
    resource: namespaces
  - group: ""
    name: openshift-ovirt-infra
    resource: namespaces
  - group: ""
    name: openshift-vsphere-infra
    resource: namespaces

Additional environment details (platform, options, etc.):

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.