/machine-remediation

Incubating: Deploy all components to recognize unhealthy machines and apply different remediation strategies on it

Primary LanguageGoApache License 2.0Apache-2.0

Machine Remediation

Remediation Flow

Remediation Flow

Architecture

The machine remediation contains components to monitor and remediate unhealthy machines for different platforms, it works on top of machine-api-operator controllers.

It contains:

How to deploy

You can check the GitHub releases to get latest yaml file, that includes CRD's, RBAC rules and deployment and apply it to your cluster.

kubectl apply -f https://github.com/kubevirt/machine-remediation/releases/download/v0.4.3/machine-remediation.v0.4.3.yaml

Once the deployment finishes, create a MachineHealthCheck object and be sure to give it the healthchecking.openshift.io/strategy: reboot annotation that instructs the Machine Healthcheck controller to delegate remediation to us.

An example MachineHealthCheck object that covers all nodes in the cluster is as follows:

apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
 name: simple-example
 namespace: openshift-machine-api
 annotations:
   healthchecking.openshift.io/strategy: reboot
spec:
 selector:
   matchLabels:
     machine.openshift.io/cluster-api-machine-role: worker
 unhealthyConditions:
 - type: Ready
   status: Unknown
   timeout: 60s

How to run e2e tests

You should have k8s or OpenShift environment with at least two worker nodes and run:

export KUBECONFIG=/dir/cluster/kubeconfig
make e2e-tests-run