terraform-aws-modules/terraform-aws-eks

worker_groups_launch_template failed to update asg_desired_capacity number

brant4test opened this issue ยท 8 comments

I have issues

I'm submitting a...

  • bug report

What is the current behavior?

I updated asg_desired_capacity from 2 to 4, and asg_max_size from 4 to 5, eventually only asg_max_size got updated into Spot ASG, the number of asg_desired_capacity stays the same 2

$ vi terraform-aws-modules\terraform-aws-eks\examples\eks_test_fixture\main.tf
...
  worker_groups_launch_template = [
    {
      # This will launch an autoscaling group with only Spot Fleet instances
      name                                     = "workers_group_launch_template_spot"
      instance_type                            = "m4.large"
      override_instance_type                   = "t2.large"
      asg_desired_capacity                     = "4"
      asg_max_size                             = "5"
      asg_min_size                             = "1"
...

$ terraform plan

  ~ update in-place
Terraform will perform the following actions:
  ~ module.eks.aws_autoscaling_group.workers_launch_template
      max_size: "4" => "5"
Plan: 0 to add, 1 to change, 0 to destroy.

$ terraform apply

module.eks.aws_autoscaling_group.workers_launch_template: Modifying... (ID: test-eks-AbEd-workers_group_launch_template_spot20190222043445948300000014)
  max_size: "4" => "5"
module.eks.aws_autoscaling_group.workers_launch_template: Modifications complete after 0s (ID: test-eks-AbEd-workers_group_launch_template_spot20190222043445948300000014)

What's the expected behavior?

Should've implemented asg_desired_capacity 4, and asg_max_size 5 in AWS.
Any tips?

Are you able to fix this problem and submit a PR? Link here if you have already.

Environment details

  • Affected module version: v2.2.1
  • OS: Ubuntu 16.04.3 LTS
  • Terraform version: Terraform v0.11.11

Any other relevant info

Thanks!

@confiq Anything I missed?

Working as intended. Desired capacity isn't changed after initial creation because it would conflict with runtime autoscalers like cluster-autoscaler.

See https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/autoscaling.md

@confiq Anything I missed?

I would not call this a bug. It might be it's not documented well but not a bug...

Thank you! Not yet installed cluster-autoscaler, will do soon.

@brant4test I ended up appending the capacity to the name so that a change in desired capacity could force a re-create. This in combination with asg_force_delete on the worker group I can force a group to 0 which will remove the nodes, has proved useful for my blue/green worker groups so that I can force a group to spin down after I have drained and moved pods from one group to another and ensure that the group is no longer visible to the autoscaler.

"name", "foo-${var.desired_capacity}",

The flow to move from blue to green then looks like:

  1. Set min and max of the blue group to 0 this will ensure the autoscaler ignores this group for any new scale-up events but the nodes will not be removed due to instance protection.
  2. Add enough capacity to the green group and wait for it to come up
  3. Start draining nodes in the blue group
  4. Once drained and all pods moved over to the green nodes, I set the desired capacity of the blue group to 0, this in combination with the name change and asg_force_delete will go ahead and remove those instances.

Was handy for us so thought I would share.

So there is no way to resize using Terraform? I understand the issues for autoscaler, but it should be optional to ignore or not desired_capacity.

We are trying to have some blue/green setup which implies having alwas 2 ASGs running, one with zero size and the other with the desired size and everytime we need to update the AMI we just move things to the other ASG, which needs resize on both groups.

So there is no way to resize using Terraform?

Correct.

Currently it's not possible to support both ways as lifecycle rules in Terraform don't support interpolation. Therefore we support the most common and correct method, which is to use the cluster-autoscaler.

Also worth noting that scaling down the ASG is usually a manual process anyway. To do this safely you need to drain the node and then terminate it. If you are scaling the cluster down solely via changing the ASG min/max/desired, then the ASG will just terminate some instance without draining, which in many situations is not desirable. But I understand in your usecase it might be OK.

Hope that makes sense!

I'm going to lock this issue because it has been closed for 30 days โณ. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.