Binding and placement of pending Pods on to respective nodes are managed by a scheduler in kubernetes called kube-scheduler. The placement decision are often managed by configurable scheduling policies and rules, often called as predicates and priorities. The desicion of a scheduler is based on the actual condition/state of the Cluster at the time when the Pod is requested to be deployed. Since the Kubernetes cluster may change its state by update/change of labels, taints, tolerations or even by introducing new nodes into it. There may be a desire of relocating a pod from one node to another, a.k.a descheduler.
There could be a number of scenarios when you may want to deschedule (evict) a Pod from a node, lets discuss some of them:
You have just introduced a new node in the cluster and want to distribute the workload evenly.
Pod and node affinity requirements, such as taints or labels, have changed and the original scheduling decisions are no longer appropriate for certain nodes.
Pods are in stale condition (readiness and health probes are returning errors) on a failed Node, manual eviction is required.
Similar pods are running on a single node.
A 20% or less CPU/Mem is reserved then it is considered as low utlization of a node. You may want to deschedule its Pods and killing it all together.
A 20% or less CPU/Mem is reserved then it is considered as low utlization of a node. You may want to deschedule its Pods once a new node is added to share the load.
You want to Remove Pods Violating Inter Pod AntiAffinity
You may also want to Remove Pods Violating Node Affinity
Remove Pods Violating Node Taints
Remove Pods Violating Topology Spread Constraint
Remove Pods Having Too Many Restarts, as it may be because of underlying Node Taints, Resources available etc.
Remove Pods that are too old, lifetime requirements can be configured using descheduling policy.
Pods have been restarted may times.
The descheduler is not a native Kubernetes service, which is why there are multiple options available in both licensed and opensource communities
- Openshift Descheduler
- Open Source Deschedulers
- Cloud Provider Market Places
- SaaS Providers (e.g: Kaiops.io)
I have setup a basic cluster with a few nodes and some sample workload running on them. We will use a few scenarios to observe the behavior of the cluster with/without the descheduler services.
- Add a new Node
- Remove Pod Duplicates
- Node afinity updated using taints and labels.