/local-pvc-releaser

A Kubernetes controller designed to oversee Persistent Volume Claims (PVCs) associated with local storage on worker nodes. Its purpose is to enhance resilience and facilitate automatic recovery in the event of node termination.

Primary LanguageGoApache License 2.0Apache-2.0

Go Report Card License


Local-pvc-releaser is a Kubernetes controller that improves the efficiency of managing Persistent Volume Claims (PVC) when unexpected node termination occurs by the cloud provider. In cases like this, the Local-pvc-releaser will delete the relevant PVCs as long as they are bounded to a Persistent Volume (PV) that represent a local storage on the faulty node.

Table of contents

Description

The Local-pvc-releaser controller automate the recovery process for pods incase their associated PVCs is bounded to a PV that represents a local storage drive on a faulty node.
Where previously, manual action had to be taken in order to recover the related pods as their state moved to be "Pending", expecting that the faulty node will recover - Something that will not happen as the faulty node got terminated.
The Local-pvc-releaser take an active action by deleting those PVCs and let the pods create a new one instead. The creation of a new PVC will represent a demand for a new node creation (as long as there are no available resources in the cluster) for the common autoscalers. When the relevant resources will be allocated, the Kubernetes scheduler will schedule the pod and complete the recovery process..

Compatibility

  • K8s version 1.26+
  • *Dynamic storage provisioners

Note:
The Local PVC Releaser relies on the well-known Kubernetes label volume.kubernetes.io/selected-node to link Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) with a terminated node.
Consequently, PVs created by static storage provisioners, such as the local-static-provisioner, will not be managed because the binding between PV and PVC is not performed by the Kubernetes control plane and therefore, this well-known label will not be attached.

How it works

The Local-pvc-releaser controller listens to the Kubernetes Node Controller running as part of the cluster control-plane.
The Kuberentes Node Controller is generating "RemovingNode" event upon any node object removal. This is usually happens when you scale down your cluster or if unexpected termination happen to one of the master/worker nodes.
The Local-pvc-releaser watch those events and reconcile the state of the PVC that are bounded to a PV objects generated from a local storage on the faulty node.
By reconciling (deleting) the needed PVCs, The pod can create a new PVC object and by that, recover as long as there will be available/new resources for him to be scheduled with.


Getting Started

For deploying this controller, You’ll need a Kubernetes cluster to run against. You can use KIND to get a local cluster for testing, or run against a remote cluster (using the current context in kubeconfig).

Deploying using Helm

Deploying the controller using Helm by:

$ helm repo add local-pvc-releaser https://AppsFlyer.github.io/local-pvc-releaser
$ helm install -n <namespace> <release-name> local-pvc-releaser/local-pvc-releaser

For more information, please refer here.

Uninstalling the Chart

To uninstall/delete the local-pvc-releaser deployment:

$ helm delete --purge local-pvc-releaser

Observability

Local-pvc-releaser controller is publishing the base metrics that are provided by KubeBuilder + additional custom metric indicating about successful PVC deletion and exposed by Prometheus exporter. For more information, please refer here.

Custom metrics

deleted_pvc

Labels: namespace, controller_name, dryrun
Description: The number of successful PVC objects that got deleted by the controller

Contributing

We appreciate and welcome any initiative for improvement. Before raising a PR, Kindly make sure that your code passed all the required CI stages successfully.

Local Deployment

Deploy by:

make deploy

Or, Selectively, deploy the controller with different image tag by:

make deploy IMG=<some-registry>/Local-pvc-releaser:tag

Undeploy controller

UnDeploy the controller from the cluster:

make undeploy