Changing the spec does not trigger an update.
biguatch opened this issue ยท 17 comments
I have to following main.tf
resource "eksctl_cluster" "preprod" {
eksctl_bin = "eksctl"
eksctl_version = "0.29.2"
name = "${local.resource_prefix}-eks-cluster"
region = var.region
vpc_id = var.vpc_id
version = "1.14"
spec = templatefile(local.config_file, merge(var.eks_config_placeholders, {
__node_group_role_arn__ = aws_iam_role.node-group-role.arn
__node_group_role_profile_arn__ = aws_iam_instance_profile.node-group-role-profile.arn
__cluster_role_arn__ = aws_iam_role.cluster-role.arn
__project_tags__ : indent(6, yamlencode(var.project_tags))
}))
depends_on = [
aws_iam_instance_profile.cluster-role-profile,
aws_iam_instance_profile.node-group-role-profile
]
}
and the template file is
vpc:
subnets:
private:
${__az0__}:
id: ${__sb0__}
${__az1__}:
id: ${__sb1__}
cloudWatch:
clusterLogging:
enableTypes:
[ "audit", "authenticator", "controllerManager", "scheduler", "api" ]
iam:
serviceRoleARN: ${__cluster_role_arn__}
nodeGroups:
- name: ${__spot_big_ng_name__}
minSize: 0
maxSize: 10
desiredCapacity: 1
privateNetworking: true
instancesDistribution:
instanceTypes: [ "t2.xlarge" ]
onDemandBaseCapacity: 0
onDemandPercentageAboveBaseCapacity: 0
spotAllocationStrategy: "capacity-optimized"
labels:
lifecycle: Ec2Spot
nodegroup-role: worker
node-role.spot-worker: "true"
tags:
${__project_tags__}
k8s.io/cluster-autoscaler/node-template/label/lifecycle: Ec2Spot
k8s.io/cluster-autoscaler/node-template/label/intent: apps
iam:
instanceRoleARN: ${__node_group_role_arn__}
instanceProfileARN: ${__node_group_role_profile_arn__}
- name: ${__spot_small_ng_name__}
minSize: 0
maxSize: 10
desiredCapacity: 1
privateNetworking: true
instancesDistribution:
instanceTypes: [ "t2.xlarge" ]
onDemandBaseCapacity: 0
onDemandPercentageAboveBaseCapacity: 0
spotAllocationStrategy: "capacity-optimized"
labels:
lifecycle: Ec2Spot
nodegroup-role: worker
node-role.spot-worker: "true"
tags:
${__project_tags__}
k8s.io/cluster-autoscaler/node-template/label/lifecycle: Ec2Spot
k8s.io/cluster-autoscaler/node-template/label/intent: apps
iam:
instanceRoleARN: ${__node_group_role_arn__}
instanceProfileARN: ${__node_group_role_profile_arn__}
managedNodeGroups:
- name: ${__managed_addon_ng_name__}
instanceType: t2.xlarge
privateNetworking: true
minSize: 2
desiredCapacity: 2
maxSize: 10
volumeSize: 50
labels:
role: add-ons-platform
tags:
${__project_tags__}
nodegroup-role: worker
lifecycle: OnDemand
iam:
instanceRoleARN: ${__node_group_role_arn__}
- name: ${__managed_worker_ng_name__}
instanceType: t2.xlarge
privateNetworking: true
minSize: 2
desiredCapacity: 2
maxSize: 10
volumeSize: 50
labels:
role: worker
node-role.worker: "true"
tags:
${__project_tags__}
nodegroup-role: worker
lifecycle: OnDemand
k8s.io/cluster-autoscaler/node-template/taint/onDemandInstance: "true:PreferNoSchedule"
iam:
instanceRoleARN: ${__node_group_role_arn__}
- name: ${__managed_addon_monitoring_ng_name__}
instanceType: t2.xlarge
privateNetworking: true
minSize: 2
desiredCapacity: 2
maxSize: 5
volumeSize: 50
labels:
role: add-ons-monitoring
tags:
${__project_tags__}
nodegroup-role: worker
lifecycle: OnDemand
iam:
instanceRoleARN: ${__node_group_role_arn__}
When I change the minSize/desiredCapacity/maxSize and tun terraform apply
nothing changes on AWS state. But terraform plan
shows the following
- minSize: 2
- desiredCapacity: 2
+ minSize: 3
+ desiredCapacity: 3
@biguatch Thanks for reporting! Unfortunately, I'm currently assuming this as rather an eksctl-issue.
eksctl scale nodegroup
is the correct command to be executed for this kind of update. But for scaling-down, it lacks the ability to drain the nodes being terminated which is crucial for availability. See https://eksctl.io/usage/managing-nodegroups/#scaling
So, until it is fixed in eksctl, the best way should be define a similar nodegroup in your spec
(=mostly cluster.yaml), so that terraform apply
creates the newer one and then drains/removes the older one, leaving you only the up-to-date nodegroup with your new settings(minSize, desiredCapacity).
@biguatch Thanks for reporting! Unfortunately, I'm currently assuming this as rather an eksctl-issue.
eksctl scale nodegroup
is the correct command to be executed for this kind of update. But for scaling-down, it lacks the ability to drain the nodes being terminated which is crucial for availability. See https://eksctl.io/usage/managing-nodegroups/#scalingSo, until it is fixed in eksctl, the best way should be define a similar nodegroup in your
spec
(=mostly cluster.yaml), so thatterraform apply
creates the newer one and then drains/removes the older one, leaving you only the up-to-date nodegroup with your new settings(minSize, desiredCapacity).
I am going to try renaming the goups and run apply and see what happens.
But, even if I don't scale the groups up or down, and only add/remove tags, those changes are not applied to the target.
@mumoshu I renamed the groups, assuming eskctl
would drain and remove the old groups and create new ones.
eksctl
created new ones as expected but it is stuck while draining with the following log
2020-10-12T06:51:29.758+0100 [DEBUG] plugin.terraform-provider-eksctl_v0.8.5: 2020/10/12 06:51:29 [DEBUG] eksctl: "[!] ignoring DaemonSet-managed Pods: kube-system/aws-node-8jjnz, kube-system/aws-node-termination-handler-2wr6g, kube-system/kube-proxy-lk9c5, kube-system/kube2iam-sbjhc, logging-system/fluentd-cloudwatch-65djn, logging-system/loki-preprod-promtail-frc7t, monitoring-system/prometheus-operator-prometheus-node-exporter-jrvkq"
2020-10-12T06:51:29.792+0100 [DEBUG] plugin.terraform-provider-eksctl_v0.8.5: 2020/10/12 06:51:29 [DEBUG] eksctl: "[!] pod eviction error (\"Cannot evict pod as it would violate the pod's disruption budget.\") on node ip-172-20-1-93.eu-west-1.compute.internal โ will retry after delay of 5s"
Am I missing something? (found this eksctl-io/eksctl#693)
@biguatch Could you check all the pods listed in the log has corresponding Deployment or DaemonSet, and has at least N replicas where PDBs is configured to keep N replicas?
only add/remove tags, those changes are not applied to the target.
@biguatch Seems like a eksctl update cluster
call is missing in the provider for this. I'll implement it shortly. Thanks!
You still need to check your PDBs for the drain issues though.
You still need to check your PDBs for the drain issues though.
I will do some more testing soon.
PS: Thanks for the quick responses!
@biguatch I have some bad news. eksctl update cluster
nor eksctl upgrade cluster
is unable to add tags via update ๐ข So the best possible thing the provider can do would be to let it trigger the cluster replacement when tags are changed.
@biguatch FYI, I've released v0.9.0 to (1)add the ability to automatically run eksctl update cluster
for k8s version update on update and (2)mark eksctl_cluster to be recreated and replaced on tags
update.
@biguatch FYI, I've released v0.9.0 to (1)add the ability to automatically run
eksctl update cluster
for k8s version update on update and (2)mark eksctl_cluster to be recreated and replaced ontags
update.
@mumoshu Thanks for the quick release. Just to clarify, if I change the tags of the cluster, does it mean it is going to destroy the existing cluster, work nodes etc and recreate it?
Just to clarify, if I change the tags of the cluster, does it mean it is going to destroy the existing cluster, work nodes etc and recreate it?
@biguatch Exactly.
The provider runs eksctl create cluster
on create
and eksctl delete cluster
on destroy
. Terraform recreates a resource by calling destroy
and create
. And Terraform triggers a recreate when the user modified an attribute marked as "force-new". eksctl_cluster.tags
is now marked force-new hence the cluster is recreated on tags update.
@mumoshu Wow, that is too extreme :) I will keep my shell scripts to tag the cluster for now.
@biguatch Agreed ๐ We need more upvotes to eksctl-io/eksctl#731!
Seeing a similar issue, where bumping the version
does not trigger an upgrade - the plan shows the corresponding diff, but after terraform apply
ing it, the cluster stays on the same old version.
An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# module.eks.eksctl_cluster.this will be updated in-place
~ resource "eksctl_cluster" "this" {
id = "XXXXXXXXX"
name = "test-wvMKBVUL"
tags = {}
~ version = "1.18" -> "1.19"
# (9 unchanged attributes hidden)
# (2 unchanged blocks hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
kubectl version --short
Server Version: v1.18.9-eks-d1db3c
Exactly the same problem here. It marks it as an "eksctl_cluster.eks-cluster will be updated in-place" also I can see that version should be changed:
~ version = "1.18" -> "1.19"
but after apply it takes 25 seconds to "do" that but after nothing changes. Version of EKS and nodes are all the same as before:
eksctl_cluster.eks-cluster: Modifying... [id=c21frchnlodc2hl1e3dg]
eksctl_cluster.eks-cluster: Still modifying... [id=c21frchnlodc2hl1e3dg, 10s elapsed]
eksctl_cluster.eks-cluster: Still modifying... [id=c21frchnlodc2hl1e3dg, 20s elapsed]
eksctl_cluster.eks-cluster: Modifications complete after 25s [id=c21frchnlodc2hl1e3dg]
Releasing state lock. This may take a few moments...
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
Is there even a way that we can use this provider to update EKC clusters?
Thx!
EDIT: Is there a change that --approve flag is missing in the module therefore it only does a dry run but not actually apply the changes?