NodeGroups ASGs should be per-AZ not balanced over AZs
Closed this issue · 1 comments
chrisfarms commented
What
We should have separate AutoScalingGroups for each AvailabilityZone for each NodeGroup
Why
- AutoScaling groups attempt to "rebalance" over AZs, which is rarely what we want for our setup and can cause problems for Pods that want EBS volumes since they are AZ-bound
- Cluster autoscaler does not support Auto Scaling Groups which span multiple Availability Zones; instead you should use an Auto Scaling Group for each Availability Zone and enable the --balance-similar-node-groups feature. If you do use a single Auto Scaling Group that spans multiple Availability Zones you will find that AWS unexpectedly terminates nodes without them being drained because of the rebalancing feature.
- So that we can have more fine grained control over the nodes in each AZ rather than scaling up in 3x nodes each time.
More info here: https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/aws
chrisfarms commented
Resolved by #600 and friends too ... we have left the kiam/ci nodes as is spanning all AZs as these are not autoscaled