Running asg-roller at the same time as cluster-autoscaler results in a cluster of unschedulable nodes
tom-butler opened this issue · 1 comments
I've been trying to get ASG Roller to work with Cluster Autoscaler but the two seem to be clashing and resulting in a cluster of unschedulable nodes.
I think the following is happening:
- ASG Roller notices difference in launch template
- ASG Roller scales up cluster
- Cluster Autoscaler notices new nodes with no usage, and taints then as PreferNoSchedule
- ASG Roller cordons and drains old nodes (all nodes are now unschedulable)
The issue seems to be that cluster-autoscaler taints nodes before it scales them down, the timing of the taint isn't configurable in cluster autoscaler.
Could ASG roller be updated to set the annotation "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true" during scaling events?
I believe this will stop the clashing of ASG Roller and Cluster Autoscaler
Nice catch @tom-butler . The irony is that I originally wrote this while working with a prod deployment that also used cluster autoscaler. I was worried about this conflict, but in the end, that deployment didn't need the roller, while a different one, which doesn't use autoscaler, did, so the conflict just didn't happen.
Yes, I think that is the correct process, and then remove the taint when done scaling.
Care to open a PR for it?