Missing c7i-flex.* instance types is causing nodes not to join the cluster
eiqops opened this issue · 3 comments
What happened:
My nodes can't join the cluster because the script in user data is returning an error code.
User Data Script
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="//"
--//
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash -xe
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
/etc/eks/bootstrap.sh 'my-cluster' --apiserver-endpoint 'removed_it_to_share_it_on_github' --b64-cluster-ca 'removed_it_to_share_it_on_github' \
--container-runtime containerd \
--dns-cluster-ip '172.20.0.10' \
--use-max-pods false \
--kubelet-extra-args '--max-pods=29'
--//--
Logs:
cat /var/log/cloud-init-output.log
2024-05-14 17:08:40,380 - cc_scripts_user.py[WARNING]: Failed to run module scripts_user (scripts in /var/lib/cloud/instance/scripts)
2024-05-14 17:08:40,381 - util.py[WARNING]: Running module scripts_user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed
cat /var/log/user-data.log
...
2024-05-14T17:08:40+0000 [eks-bootstrap] INFO: Using IP family: ipv4
2024-05-14T17:08:40+0000 [eks-bootstrap] INFO: No entry for type 'c7i-flex.large' in /etc/eks/eni-max-pods.txt. Will attempt to auto-discover value.
/etc/eks/max-pods-calculator.sh: line 41: a1.2xlarge: command not found
/etc/eks/max-pods-calculator.sh: line 42: a1.4xlarge: command not found
/etc/eks/max-pods-calculator.sh: line 43: a1.large: command not found
...
What you expected to happen:
The node(s) can join the cluster normally as with other non c7i-flex.*
instance types.
How to reproduce it (as minimally and precisely as possible):
- Create an EKS cluster
- Include the current file at
templates/shared/runtime/eni-max-pods.txt
on your AMI (golden or vanilla) - Set up the user data script to use the
/etc/eks/bootstrap.sh 'my-cluster' ...
- Create a new node using
c7i-flex.*
instance's type - Check the logs
Anything else we need to know?:
Issue is happening only with c7i-flex.
type of instances.
Environment:
-
AWS Region: several regions (
ca-central-1
,ap-southeast-1
, etc) -
Instance Type(s): any
c7i-flex.*
-
EKS Platform version:
eks.12
-
Kubernetes version:
1.28.9
-
AMI Version:
Golden AMI based on Ubuntu 22.04
-
Kernel:
Linux ip-X-X-X-X 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:44:33 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
-
Release information (run
cat /etc/eks/release
on a node):
I'm not able to reproduce this. What do you mean by "Golden AMI based on Ubuntu 22.04"? Is this an AMI you built yourself?
Hi @cartermckinnon, thank you for looking into it.
Is this an AMI you built yourself?
Yes, it is an AMI built by ourselves.
In this file source.zip you will find the bootstrap.sh, and other files we are putting into the AMI, so that we can run /etc/eks/bootstrap.sh ...
as the user data script.