eksctl-io/eksctl

[Bug] Addons get deployed before associated IAM roles are created

Opened this issue · 2 comments

What were you trying to accomplish?

I am trying to create a new cluster with the vpc-cni addon configured against a role that is created on the fly (to avoid #7951). Currently, the order of cloudformation stacks is:

  • addons
  • managed node groups
  • service accounts (and associated iam roles)

Because of this, the iam role that should exist for the vpc-cni addon doesn't exist, causing the vpc-cni plugin to never have it's pods created. Since the managed node groups is next, the node group will fail to be marked ready for EKS because the vpc-cni addon has yet to be ready. Thus, a cluster creation will fail.

What happened?

The cluster failed to be created successfully.

How to reproduce it?

eksctl create cluster -v 5 -f cluster.yaml

Contents of cluster.yaml below:

---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: prod
  region: ap-northeast-1
  version: "1.30"
  tags:
    environment: prod
    managed-by: eksctl

iam:
  serviceAccounts:
  - metadata:
      name: prod-apn1-ebs-csi-driver-role
      namespace: kube-system
    attachPolicyARNs:
    - "arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy"
    tags:
      environment: prod
      managed-by: eksctl
  - metadata:
      name: prod-apn1-vpc-cni-role
      namespace: kube-system
    attachPolicyARNs:
    - "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
    tags:
      environment: prod
      managed-by: eksctl
  withOIDC: true

fargateProfiles:
  - name: default
    selectors:
      - namespace: default
      - namespace: kube-system
  - name: ingress-nginx
    selectors:
      - namespace: ingress-nginx

vpc:
  cidr: 10.129.0.0/16
  autoAllocateIPv6: true
  hostnameType: resource-name
  clusterEndpoints:
    publicAccess: true
    privateAccess: true

cloudWatch:
  clusterLogging:
    enableTypes: ["audit", "authenticator", "controllerManager"]
    logRetentionInDays: 60

managedNodeGroups:
  - name: airflow
    labels:
      role: airflow
    tags:
      environment: prod
      managed-by: eksctl
    instanceType: t3.xlarge
    minSize: 1
    maxSize: 6
    desiredCapacity: 1
    volumeSize: 280
    privateNetworking: true
    iam:
      withAddonPolicies:
        appMesh: true
        appMeshPreview: true
        autoScaler: true
        awsLoadBalancerController: true
        certManager: true
        cloudWatch: true
        ebs: true
        efs: true
        externalDNS: true
        fsx: true
        imageBuilder: true
        xRay: true

addons:
- name: aws-ebs-csi-driver
  serviceAccountRoleARN: arn:aws:iam::1234567890:role/prod-apn1-ebs-csi-driver-role
- name: coredns
- name: kube-proxy
- name: eks-pod-identity-agent
- name: vpc-cni
  serviceAccountRoleARN: arn:aws:iam::1234567890:role/prod-apn1-vpc-cni-role

Logs

https://gist.github.com/josegonzalez/b9b9b5bd0f82603ffe5c60db00232094

Anything else we need to know?

Versions

% eksctl info
eksctl version: 0.191.0-dev+c736924d6.2024-09-27T00:54:42Z
kubectl version: v1.31.1
OS: darwin

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

This is still a bug.