pusher/k8s-spot-rescheduler

k8s-spot-rescheduler doesn't handle pod disruption budgets nicely, leaving nodes underutilized and tainted

morganwalker opened this issue · 1 comments

We're using kops 1.10.0 and k8s 1.10.11. We're using two separate instance groups (IG), nodes (on-demand) and spots (spot), both spread across 3 availability zones. I've applied the appropriate nodeLabels and have defined the following in my k8s-spot-rescheduler deployment manifest:

- --on-demand-node-label=on-demand
- --spot-node-label=spot

The nodes IG has the spot=false:PreferNoSchedule taint so the spots IG is preferred. I'm using the cluster autoscaler to autodiscover both IGs via the --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,kubernetes.io/cluster/kubernetes.metis.wtf and these tags exist on both IGs. I've confirmed that pods on most nodes nodes are able to be drained and moved to spots nodes. With an exception:

  • k8s-spot-reschedule picks a node and states

    moved. Will drain node.
    

    which isn't true

  • It then figures out it's unable to drain the node due to PDBs

    E0117 14:03:51.801764       1 rescheduler.go:302] Failed to drain node: Failed to drain node /ip-172- 
    20-61-39.ec2.internal, due to following errors: [Failed to evict pod skafos-notebooks/hub- 
    deployment-cf799d494-gp6z4 within allowed timeout (last error: Cannot evict pod as it would 
    violate the pod's disruption budget.)]
    

    and aborts the drain.

Now we're left with an on-demand node that has had all of its pods evicted except those with PDBs, leaving the on-demand node underutilized and tainted with ToBeDeletedByClusterAutoscaler. It seems like it should check if it can drain all pods, taking into consideration PDBs, and if it can't, don't evict any pods and don't taint with ToBeDeletedByClusterAutoscaler.

I am facing the same problem
Raised the below PR for the same
https://github.com/pusher/k8s-spot-rescheduler/pull/60/files