openshift/machine-config-operator

Custom MachineConfigPool created for KubeletConfig resource is not configured

bharath-b-rh opened this issue · 6 comments

Custom MachineConfigPool created for a KubeletConfig resource is not getting configured.

For a KubeletConfig created, kubelet-config-controller creates a MachineConfig using the role matching template and MCP is configured with the matching MachineConfigs.
kubelet-config-controller is expecting the matching MCP to be configured to generate required MachineConfig and render-controller expects at least one matching MachineConfig to configure MCP which is creating a deadlock and the kubelet configuration is not applied on the machines.

Steps to reproduce the issue:

  1. Create a custom MCP resource like below
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  labels:
    machineconfigs.nodeobservability.olm.openshift.io/profiling: ""
    machineconfiguration.openshift.io/role: nodeobservability
  name: nodeobservability
spec:
  configuration:
    source:
    - apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      name: 10-nodeobservability-generated-kubelet
  machineConfigSelector:
    matchLabels:
      machineconfiguration.openshift.io/role: nodeobservability
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/worker: ""
  1. Create a KubeletConfig resource like below
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  labels:
    machineconfigs.nodeobservability.olm.openshift.io/profiling: ""
    machineconfiguration.openshift.io/role: nodeobservability
  name: 10-kubelet-nodeobservability
spec:
  kubeletConfig:
    enableProfilingHandler: true
  machineConfigPoolSelector:
    matchLabels:
      machineconfiguration.openshift.io/role: nodeobservability

Describe the results you received:
For the KubeletConfig created, equivalent MachineConfig was not generated and the changes were not applied on the matching machines using the custom MCP created for the scenario.

Below logs are of machine-config-controller

I0418 06:16:13.658835       1 node_controller.go:723] Pool nodeobservability is unconfigured, pausing 5s for renderer to initialize
I0418 06:16:13.663919       1 render_controller.go:377] Error syncing machineconfigpool nodeobservability: no MachineConfigs found matching selector machineconfiguration.openshift.io/role=nodeobservability
I0418 06:16:13.673355       1 render_controller.go:377] Error syncing machineconfigpool nodeobservability: no MachineConfigs found matching selector machineconfiguration.openshift.io/role=nodeobservability
I0418 06:16:13.678003       1 kubelet_config_controller.go:304] Error syncing kubeletconfig 10-kubelet-nodeobservability: Pool nodeobservability is unconfigured, pausing 5s for renderer to initialize
I0418 06:16:13.688066       1 render_controller.go:377] Error syncing machineconfigpool nodeobservability: no MachineConfigs found matching selector machineconfiguration.openshift.io/role=nodeobservability
I0418 06:16:13.712632       1 render_controller.go:377] Error syncing machineconfigpool nodeobservability: no MachineConfigs found matching selector machineconfiguration.openshift.io/role=nodeobservability

Describe the results you expected:
Expected configuration provided in KubeletConfig to be applied on machines using the matching MCP.

Additional information you deem important (e.g. issue happens only occasionally):
Issue is observed when the MCP is configured for just KubeletConfig, if any MachineConfig had matching labels, then scenario works as expected.

Output of oc adm release info --commits | grep machine-config-operator:

  machine-config-operator                        https://github.com/openshift/machine-config-operator                        14a1ca2cb91ff7e0faf9146b21ba12cd6c652d22

Additional environment details (platform, options, etc.):
Cluster created using CRC.

CodeReady Containers version: 2.0.1+bf3b1a6
OpenShift version: 4.10.3
Podman version: 3.4.4

Observation
kubelet-config-controller uses the name of the MCP for choosing the template to generate the MachineConfig, and expects it to be either 'master' or 'worker' and if the MCP name does not match either defaults to 'worker'. Should not "machineconfiguration.openshift.io/role" label be made use of for picking the template?

Please provide a must gather from the cluster and please verify that you followed the instructions located: https://github.com/openshift/machine-config-operator/blob/master/docs/KubeletConfigDesign.md#example---setting-the-kubelet-log-level

must-gather-3090.tar.gz

Summary shared by must-gather tool

ClusterID: 7832c438-4598-47c3-af94-623c7c0c45db
ClusterVersion: Stable at "4.10.3"
ClusterOperators:
clusteroperator/cloud-credential is missing
clusteroperator/cluster-autoscaler is missing
clusteroperator/insights is missing
clusteroperator/kube-storage-version-migrator is missing
clusteroperator/monitoring is missing
clusteroperator/storage is missing

Observed below logs when the scenario mentioned in #3094 was checked in CRC created cluster, same as above.

I0422 04:36:33.698299       1 render_controller.go:527] Pool nodeobservability: now targeting: rendered-nodeobservability-18966e749a11bf8e03c578db41cf041d
I0422 04:36:33.700779       1 render_controller.go:377] Error syncing machineconfigpool nodeobservability: Operation cannot be fulfilled on machineconfigpools.machineconfiguration.openshift.io "nodeobservability": the object has been modified; please apply your changes to the latest version and try again
W0422 04:36:38.698771       1 node_controller.go:808] can't get pool for node "crc-jw57j-master-0": node crc-jw57j-master-0 has both master role and custom role nodeobservability
W0422 04:36:38.698809       1 node_controller.go:808] can't get pool for node "crc-jw57j-master-0": node crc-jw57j-master-0 has both master role and custom role nodeobservability
I0422 04:36:38.698825       1 status.go:90] Pool nodeobservability: All nodes are updated with rendered-nodeobservability-18966e749a11bf8e03c578db41cf041d
W0422 04:36:43.704839       1 node_controller.go:808] can't get pool for node "crc-jw57j-master-0": node crc-jw57j-master-0 has both master role and custom role nodeobservability
W0422 04:36:43.704865       1 node_controller.go:808] can't get pool for node "crc-jw57j-master-0": node crc-jw57j-master-0 has both master role and custom role nodeobservability

must-gather is collected for both the scenarios.

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.