/windows-machine-config-operator

Windows MCO for OpenShift that handles addition of Windows nodes to the cluster

Primary LanguageGoApache License 2.0Apache-2.0

Windows Machine Config Operator

Introduction

The Windows Machine Config Operator configures Windows instances into nodes, enabling Windows container workloads to be ran within OKD/OCP clusters. Windows instances can be added either by creating a MachineSet, or by specifying existing instances through a ConfigMap. The operator will do all the necessary steps to configure the instance so that it can join the cluster as a worker node.

More design details can be explored in the WMCO enhancement.

Pre-requisites

Usage

Installation

The operator can be installed from the community-operators catalog on OperatorHub. It can also be build and installed from source manually, see the development instructions.

Create a private key secret

Once the openshift-windows-machine-config-operator namespace has been created, a secret must be created containing the private key that will be used to access the Windows instances:

# Create secret containing the private key in the openshift-windows-machine-config-operator namespace
oc create secret generic cloud-private-key --from-file=private-key.pem=/path/to/key -n openshift-windows-machine-config-operator

We strongly recommend not using the same private key used when installing the cluster

Changing the private key secret

Changing the private key used by WMCO can be done by updating the contents of the existing cloud-private-key secret. Some important things to note:

  • Any existing Windows Machines will be destroyed and recreated in order to make use of the new key. This will be done one at a time, until all Machines have been handled.
  • BYOH instances must be updated by the user, such that the new public key is present within the authorized_keys file. You are free to remove the previous key. If the new key is not authorized, WMCO will not be able to access any BYOH nodes. Upgrade and Node removal functionality will not function properly until this step is complete.

Configuring BYOH (Bring Your Own Host) Windows instances

Instance Pre-requisites

Any Windows instances that are to be attached to the cluster as a node must fulfill these pre-requisites.

Adding instances

A ConfigMap named windows-instances must be created in the WMCO namespace, describing the instances that should be joined to a cluster. The required information to configure an instance is:

  • An address to SSH into the instance with. This can be a DNS name or an ipv4 address.
    • It is highly recommended that a DNS address is provided when instance IPs are assigned via DHCP. If not, it will be up to the user to update the windows-instances ConfigMap whenever an instance is assigned a new IP.
  • The name of the administrator user set up as part of the instance pre-requisites.

Each entry in the data section of the ConfigMap should be formatted with the address as the key, and a value with the format of username=<username>. Please see the example below:

kind: ConfigMap
apiVersion: v1
metadata:
  name: windows-instances
  namespace: openshift-windows-machine-config-operator
data:
  10.1.42.1: |-
    username=Administrator
  instance.example.com: |-
    username=core

Removing BYOH Windows instances

BYOH instances that are attached to the cluster as a node can be removed by deleting the instance's entry in the ConfigMap. This process will revert instances back to the state they were in before, barring any logs and container runtime artifacts.

In order for an instance to be cleanly removed, it must be accessible with the current private key provided to WMCO.

For example, in order to remove the instance 10.1.42.1 from the above example, the ConfigMap would be changed to the following:

kind: ConfigMap
apiVersion: v1
metadata:
  name: windows-instances
  namespace: openshift-windows-machine-config-operator
data:
  instance.example.com: |-
    username=core

Deleting windows-instances is viewed as a request to deconfigure all Windows instances added as Nodes.

Configuring Windows instances provisioned through MachineSets

Below is an example of a vSphere Windows MachineSet which can create Windows Machines that the WMCO can react upon. Please note that the windows-user-data secret will be created by the WMCO lazily when it is configuring the first Windows Machine. After that, the windows-user-data will be available for the subsequent MachineSets to be consumed. It might take around 10 minutes for the Windows instance to be configured so that it joins the cluster. Please note that the MachineSet should have following labels:

  • machine.openshift.io/os-id: Windows
  • machine.openshift.io/cluster-api-machine-role: worker
  • machine.openshift.io/cluster-api-machine-type: worker

The following label has to be added to the Machine spec within the MachineSet spec:

  • node-role.kubernetes.io/worker: ""

Not having these labels will result in the Windows node not being marked as a worker.

<infrastructureID> should be replaced with the output of:

oc get -o jsonpath='{.status.infrastructureName}{"\n"}' infrastructure cluster

The following template variables need to be replaced as follows with values from your vSphere environment:

  • <Windows_VM_template>: template name
  • <VM Network Name>: network name, must match the network name where other Linux workers are in the cluster
  • <vCenter DataCenter Name>: datacenter name
  • <Path to VM Folder in vCenter>: path where your OpenShift cluster is running
  • <vCenter Datastore Name>: datastore name
  • <vCenter Server FQDN/IP>: IP address or FQDN of the vCenter server

IMPORTANT:

  • The VM template provided in the MachineSet must use a supported Windows Server version, as described in vSphere prerequisites.
  • On vSphere, Windows Machine names cannot be more than 15 characters long. The MachineSet name, therefore, cannot be more than 9 characters long, due to the way Machine names are generated from it.
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  labels:
    machine.openshift.io/cluster-api-cluster: <infrastructureID>
  name: winworker
  namespace: openshift-machine-api
spec:
  replicas: 1
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: <infrastructureID>
      machine.openshift.io/cluster-api-machineset: winworker
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: <infrastructureID>
        machine.openshift.io/cluster-api-machine-role: worker
        machine.openshift.io/cluster-api-machine-type: worker
        machine.openshift.io/cluster-api-machineset: winworker
        machine.openshift.io/os-id: Windows
    spec:
      metadata:
        labels:
          node-role.kubernetes.io/worker: ""
      providerSpec:
        value:
          apiVersion: vsphereprovider.openshift.io/v1beta1
          credentialsSecret:
            name: vsphere-cloud-credentials
          diskGiB: 128
          kind: VSphereMachineProviderSpec
          memoryMiB: 16384
          metadata:
            creationTimestamp: null
          network:
            devices:
            - networkName:  "<VM Network Name>"
          numCPUs: 4
          numCoresPerSocket: 1
          snapshot: ""
          template: <Windows_VM_template>
          userDataSecret:
            name: windows-user-data
          workspace:
             datacenter: <vCenter DataCenter Name>
             datastore: <vCenter Datastore Name>
             folder: <Path to VM Folder in vCenter> # e.g. /DC/vm/ocp45-2tdrm
             server: <vCenter Server FQDN/IP>

Example MachineSet for other cloud providers:

Alternatively, the hack/machineset.sh script can be used to generate MachineSets for AWS and Azure platforms. The hack script will generate a MachineSet.yaml file which can be edited before using or can be used as it is. The script takes optional arguments apply and delete to directly create/delete MachineSet on the cluster without generating a yaml file.

Usage:

./hack/machineset.sh                 # to generate yaml file
./hack/machineset.sh apply/delete    # to create/delete MachineSet directly on cluster

Windows nodes Kubernetes component upgrade

When a new version of WMCO is released that is compatible with the current cluster version, an operator upgrade will take place which will result in the Kubernetes components in the Windows Machine to be upgraded. For a non-disruptive upgrade, WMCO terminates the Windows Machines configured by previous version of WMCO and recreates them using the current version. This is done by deleting the Machine object that results in the drain and deletion of the Windows node. To facilitate an upgrade, WMCO adds a version annotation to all the configured nodes. During an upgrade, a mismatch in version annotation will result in deletion and recreation of Windows Machine. In order to have minimal service disruption during an upgrade, WMCO makes sure that the cluster will have atleast 1 Windows Machine per MachineSet in the running state.

WMCO is not responsible for Windows operating system updates. The cluster administrator provides the Window image while creating the VMs and hence, the cluster administrator is responsible for providing an updated image. The cluster administrator can provide an updated image by changing the image in the MachineSet spec.

Enabled features

Autoscaling Windows nodes

Cluster autoscaling is supported for Windows instances.

Container Runtime

Windows instances brought up with WMCO are set up with the containerd container runtime. As WMCO installs and manages the container runtime, it is recommended not to preinstall containerd in MachineSet or BYOH Windows instances.

Limitations

DeploymentConfigs

Windows Nodes do not support workloads created via DeploymentConfigs. Please use a normal Deployment, or other method to deploy workloads.

Cluster-wide proxy

WMCO does not support adding Windows workloads using a cluster-wide proxy config for the OpenShift Container Platform. WMCO will not be able to automatically route proxy connections for Windows workloads.

Storage

At this time, only in-tree storage is supported in all cloud providers.

Pod Autoscaling

Horizontal and Vertical Pod autoscaling support are not available for Windows workloads.

Other limitations

WMCO / Windows nodes does not work with the following products:

Accessing secure registries

Windows nodes managed by WMCO do not support pulling container images from secure private registries. It is recommended to use images from public registries or pre-pull the images in the VM image.

Trunk port

WMCO does not support adding Windows nodes to a cluster through a trunk port. The only supported networking setup for adding Windows nodes is through an access port carrying the VLAN traffic.

Running Windows workloads

Be sure to set the OS field in the Pod spec to Windows when deploying Windows workloads. This field is used to authoritatively identify the pod OS for validation. In OpenShift, it is used when enforcing OS-specific pod security standards.

Development

See HACKING.md.