k8s-spot-rescheduler doesn't handle pod disruption budgets nicely, leaving nodes underutilized and tainted
morganwalker opened this issue · 1 comments
We're using kops 1.10.0
and k8s 1.10.11
. We're using two separate instance groups (IG), nodes
(on-demand) and spots
(spot), both spread across 3 availability zones. I've applied the appropriate nodeLabels and have defined the following in my k8s-spot-rescheduler deployment manifest:
- --on-demand-node-label=on-demand
- --spot-node-label=spot
The nodes
IG has the spot=false:PreferNoSchedule
taint so the spots
IG is preferred. I'm using the cluster autoscaler to autodiscover both IGs via the --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,kubernetes.io/cluster/kubernetes.metis.wtf
and these tags exist on both IGs. I've confirmed that pods on most nodes
nodes are able to be drained and moved to spots
nodes. With an exception:
-
k8s-spot-reschedule picks a node and states
moved. Will drain node.
which isn't true
-
It then figures out it's unable to drain the node due to PDBs
E0117 14:03:51.801764 1 rescheduler.go:302] Failed to drain node: Failed to drain node /ip-172- 20-61-39.ec2.internal, due to following errors: [Failed to evict pod skafos-notebooks/hub- deployment-cf799d494-gp6z4 within allowed timeout (last error: Cannot evict pod as it would violate the pod's disruption budget.)]
and aborts the drain.
Now we're left with an on-demand node that has had all of its pods evicted except those with PDBs, leaving the on-demand node underutilized and tainted with ToBeDeletedByClusterAutoscaler
. It seems like it should check if it can drain all pods, taking into consideration PDBs, and if it can't, don't evict any pods and don't taint with ToBeDeletedByClusterAutoscaler
.
I am facing the same problem
Raised the below PR for the same
https://github.com/pusher/k8s-spot-rescheduler/pull/60/files