[Bug] Addons get deployed before associated IAM roles are created
Opened this issue · 2 comments
What were you trying to accomplish?
I am trying to create a new cluster with the vpc-cni
addon configured against a role that is created on the fly (to avoid #7951). Currently, the order of cloudformation stacks is:
- addons
- managed node groups
- service accounts (and associated iam roles)
Because of this, the iam role that should exist for the vpc-cni
addon doesn't exist, causing the vpc-cni plugin to never have it's pods created. Since the managed node groups is next, the node group will fail to be marked ready for EKS because the vpc-cni
addon has yet to be ready. Thus, a cluster creation will fail.
What happened?
The cluster failed to be created successfully.
How to reproduce it?
eksctl create cluster -v 5 -f cluster.yaml
Contents of cluster.yaml
below:
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: prod
region: ap-northeast-1
version: "1.30"
tags:
environment: prod
managed-by: eksctl
iam:
serviceAccounts:
- metadata:
name: prod-apn1-ebs-csi-driver-role
namespace: kube-system
attachPolicyARNs:
- "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
tags:
environment: prod
managed-by: eksctl
- metadata:
name: prod-apn1-vpc-cni-role
namespace: kube-system
attachPolicyARNs:
- "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
tags:
environment: prod
managed-by: eksctl
withOIDC: true
fargateProfiles:
- name: default
selectors:
- namespace: default
- namespace: kube-system
- name: ingress-nginx
selectors:
- namespace: ingress-nginx
vpc:
cidr: 10.129.0.0/16
autoAllocateIPv6: true
hostnameType: resource-name
clusterEndpoints:
publicAccess: true
privateAccess: true
cloudWatch:
clusterLogging:
enableTypes: ["audit", "authenticator", "controllerManager"]
logRetentionInDays: 60
managedNodeGroups:
- name: airflow
labels:
role: airflow
tags:
environment: prod
managed-by: eksctl
instanceType: t3.xlarge
minSize: 1
maxSize: 6
desiredCapacity: 1
volumeSize: 280
privateNetworking: true
iam:
withAddonPolicies:
appMesh: true
appMeshPreview: true
autoScaler: true
awsLoadBalancerController: true
certManager: true
cloudWatch: true
ebs: true
efs: true
externalDNS: true
fsx: true
imageBuilder: true
xRay: true
addons:
- name: aws-ebs-csi-driver
serviceAccountRoleARN: arn:aws:iam::1234567890:role/prod-apn1-ebs-csi-driver-role
- name: coredns
- name: kube-proxy
- name: eks-pod-identity-agent
- name: vpc-cni
serviceAccountRoleARN: arn:aws:iam::1234567890:role/prod-apn1-vpc-cni-role
Logs
https://gist.github.com/josegonzalez/b9b9b5bd0f82603ffe5c60db00232094
Anything else we need to know?
Versions
% eksctl info
eksctl version: 0.191.0-dev+c736924d6.2024-09-27T00:54:42Z
kubectl version: v1.31.1
OS: darwin
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This is still a bug.