Run the descheduler in your OpenShift cluster to move pods based on specific strategies.
- Build and push the operator image to a registry:
- Ensure the
image
spec indeploy/05_deployment.yaml
refers to the operator image you pushed - Run
oc create -f deploy/.
This process refers to building the operator in a way that it can be installed locally via the OperatorHub with a custom index image
-
build and push the image to a registry (e.g. https://quay.io):
$ podman build -t quay.io/<username>/ose-cluster-kube-descheduler-operator-bundle:latest -f Dockerfile . $ podman push quay.io/<username>/ose-cluster-kube-descheduler-operator-bundle:latest
-
build and push image index for operator-registry (pull and build https://github.com/operator-framework/operator-registry/ to get the
opm
binary)$ ./bin/linux-amd64-opm index add --bundles quay.io/<username>/ose-cluster-kube-descheduler-operator-bundle:latest --tag quay.io/<username>/ose-cluster-kube-descheduler-operator-bundle-index:1.0.0 $ podman push quay.io/<username>/ose-cluster-kube-descheduler-operator-bundle-index:1.0.0
Don't forget to increase the number of open files, .e.g.
ulimit -n 100000
in case the current limit is insufficient. -
create and apply catalogsource manifest:
apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: cluster-kube-descheduler-operator namespace: openshift-marketplace spec: sourceType: grpc image: quay.io/<username>/ose-cluster-kube-descheduler-operator-bundle-index:1.0.0
-
create
cluster-kube-descheduler-operator
namespace:$ oc create ns cluster-kube-descheduler-operator
-
open the console Operators -> OperatorHub, search for
descheduler operator
and install the operator
A sample CR definition looks like below (the operator expects cluster
CR under openshift-kube-descheduler-operator
namespace):
apiVersion: operator.openshift.io/v1beta1
kind: KubeDescheduler
metadata:
name: cluster
namespace: openshift-kube-descheduler-operator
spec:
deschedulingIntervalSeconds: 1800
profiles:
- AffinityAndTaints
The operator spec provides a profiles
field, which allows users to set one or more descheduling profiles to enable.
These profiles map to preconfigured policy definitions, enabling several descheduler strategies grouped by intent, and any that are enabled will be merged.
The following profiles are currently provided:
Each of these enables cluster-wide descheduling (excluding openshift and kube-system namespaces) based on certain goals.
This is the most basic descheduling profile and it removes running pods which violate node and pod affinity, and node taints.
This profile enables the RemovePodsViolatingInterPodAntiAffinity
,
RemovePodsViolatingNodeAffinity
, and
RemovePodsViolatingNodeTaints
strategies.
This profile attempts to balance pod distribution based on topology constraint definitions and evicting duplicate copies
of the same pod running on the same node. It enables the RemovePodsViolatingTopologySpreadConstraints
and RemoveDuplicates
strategies.
This profile focuses on pod lifecycles and node resource consumption. It will evict any running pod older than 24 hours and attempts to evict pods from "high utilization" nodes that can fit onto "low utilization" nodes. A high utilization node is any node consuming more than 50% its available cpu, memory, or pod capacity. A low utilization node is any node with less than 20% of its available cpu, memory, and pod capacity.
This profile enables the LowNodeUtilizaition
and
PodLifeTime
strategies. In the future, more configuration
may be made available through the operator for these strategies based on user feedback.
Descheduler operator at a high level is responsible for watching the above CR
- Create a configmap that could be used by descheduler.
- Run descheduler as a deployment mounting the configmap as a policy file in the pod.
The configmap created from above sample CR definition looks like this:
apiVersion: descheduler/v1alpha1
kind: DeschedulerPolicy
strategies:
RemovePodsViolatingInterPodAntiAffinity:
enabled: true
...
RemovePodsViolatingNodeAffinity:
enabled: true
params:
...
nodeAffinityType:
- requiredDuringSchedulingIgnoredDuringExecution
RemovePodsViolatingNodeTaints:
enabled: true
...
(Some generated parameters omitted.)
The Descheduler operator exposes the following parameters in its CRD:
deschedulingIntervalSeconds
- this sets the number of seconds between descheduler runsprofiles
- which descheduler profiles that are enabled