gcloud container clusters create "kueue" --zone "us-central1-c" --release-channel "regular" --machine-type "n2-standard-2" --num-nodes "2"
gcloud container node-pools create "spot" --cluster "kueue" --zone "us-central1-c" --machine-type "n2-standard-2" --spot --num-nodes "2"
VERSION=v0.3.2
kubectl apply -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml
kubectl api-resources | grep kueue
kubectl apply -f kueue-resource.yaml
This will create 3 Kueue resources: 1. Resource Flavor 2. Cluster Queue 3. Local Queue
kubectl apply -f job-1.yaml -f job-2.yaml
kubectl get po -o wide
We will see we are using both node pools for these workload. We also start scheduling both jobs at the same time.
We may want to have specific workload run on specific hardware. Thats where resource flavors with nodelabels come in.
We create 2 different resource flavor for on demand and spot VMs. We then create a cluster queue for each resource flavor.
kubectl apply -f resource-flavors.yaml
kubectl apply -f cluster-queue.yaml
kubectl apply -f local-queue.yaml
Lets create some job on the on demand pool.
kubectl create -f job-2-on-demand-lq.yaml
kubectl get po -o wide
We will see all of the pods are scheduled on the on demand pool. Now lets create some load on the on demand pool and see how our pool handles it.
while true; do kubectl create -f job-2-on-demand-lq.yaml; sleep 3; done
We are creating a new job every 2 second. We have a total of 16Gb of memory. Each job is requesting roughly 3Gb of memory. So we can have 5 of these jobs running at the same time. Kueue will only let these 5 jobs to be scheduled. It is using kubernetes job suspend feature to acheive this.
kubectl get clusterqueue -o wide
We will see the cluster queue has more and more pending jobs.
In Kueue we have a concept of Cohorts to be able to share resources. All clusterqueue in the same cohort can share their resources as if they are in the same queue.
kubectl apply -f cluster-queue-cohort.yaml
By default clusterqueues in the same cohort can borrow 100% of each others resources. We can change this by setting the borrowLimit
field in the cohort.
while true; do kubectl create -f job-2-on-demand-lq.yaml; sleep 1; done
If we create some load on the spot pool those jobs will take priority over the on demand jobs. Borrowing of resources is only allowed if the cluster queue has resources to spare.
while true; do kubectl create -f job-2-spot-lq.yaml; sleep 4; done
We can see that the spot jobs are taking priority over the on demand jobs.
There are certain workloads that requires all of its pods to be scheduled before work can begin. In Kueue we can have it setup to do all or nothing scheduling. At this time its a global setting. So if you have kueue installed with this configuration it is all-or-nothing for all workload.
Without all or nothing scheduling we can end up in a situation where we hit deadlock. Where we have 2 jobs that are waiting for each other to finish before they can start. This is a common problem in distributed systems.
./deadlock.sh
kubectl apply -f configmap.yaml
Delete the existing kueue controller so the new changes are picked up.
kubectl delete po -n kueue-system --all
After the controller is up and running we can try to recreate the deadlock again. This time we will see that the deadlock is resolved.
./deadlock.sh
gcloud container clusters delete kueue --zone us-central1-c