awslabs/amazon-eks-ami

Missing c7i-flex.* instance types is causing nodes not to join the cluster

eiqops opened this issue · 3 comments

What happened:

My nodes can't join the cluster because the script in user data is returning an error code.

User Data Script

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"

--//
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash -xe
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
/etc/eks/bootstrap.sh 'my-cluster' --apiserver-endpoint 'removed_it_to_share_it_on_github' --b64-cluster-ca 'removed_it_to_share_it_on_github' \
--container-runtime containerd \
--dns-cluster-ip '172.20.0.10' \
--use-max-pods false \
--kubelet-extra-args '--max-pods=29'
--//--

Logs:

cat /var/log/cloud-init-output.log

2024-05-14 17:08:40,380 - cc_scripts_user.py[WARNING]: Failed to run module scripts_user (scripts in /var/lib/cloud/instance/scripts)
2024-05-14 17:08:40,381 - util.py[WARNING]: Running module scripts_user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed

cat /var/log/user-data.log

...

2024-05-14T17:08:40+0000 [eks-bootstrap] INFO: Using IP family: ipv4
2024-05-14T17:08:40+0000 [eks-bootstrap] INFO: No entry for type 'c7i-flex.large' in /etc/eks/eni-max-pods.txt. Will attempt to auto-discover value.
/etc/eks/max-pods-calculator.sh: line 41: a1.2xlarge: command not found
/etc/eks/max-pods-calculator.sh: line 42: a1.4xlarge: command not found
/etc/eks/max-pods-calculator.sh: line 43: a1.large: command not found

...

What you expected to happen:

The node(s) can join the cluster normally as with other non c7i-flex.* instance types.

How to reproduce it (as minimally and precisely as possible):

  1. Create an EKS cluster
  2. Include the current file at templates/shared/runtime/eni-max-pods.txt on your AMI (golden or vanilla)
  3. Set up the user data script to use the /etc/eks/bootstrap.sh 'my-cluster' ...
  4. Create a new node using c7i-flex.* instance's type
  5. Check the logs

Anything else we need to know?:

Issue is happening only with c7i-flex. type of instances.

Environment:

  • AWS Region: several regions (ca-central-1, ap-southeast-1, etc)

  • Instance Type(s): any c7i-flex.*

  • EKS Platform version: eks.12

  • Kubernetes version: 1.28.9

  • AMI Version: Golden AMI based on Ubuntu 22.04

  • Kernel: Linux ip-X-X-X-X 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:44:33 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

  • Release information (run cat /etc/eks/release on a node):

I'm not able to reproduce this. What do you mean by "Golden AMI based on Ubuntu 22.04"? Is this an AMI you built yourself?

Hi @cartermckinnon, thank you for looking into it.

Is this an AMI you built yourself?

Yes, it is an AMI built by ourselves.

In this file source.zip you will find the bootstrap.sh, and other files we are putting into the AMI, so that we can run /etc/eks/bootstrap.sh ... as the user data script.

Fixed after #1802 thank you :)