eksctl-io/eksctl

[Bug] preBootstrapCommands is not working in AL2023

xiangyanw opened this issue · 11 comments

What were you trying to accomplish?

I want to mount a data volume to EKS node with AL2023 by preBootstrapCommands.

What happened?

I configured preBootstrapCommands for a managed nodegroup in EKS version 1.30, but those commands were not added to the userdata.

Here is my preBootstrapCommands:

    preBootstrapCommands:
      - "sudo mkfs.xfs /dev/nvme1n1; sudo mkdir -p /var/lib/containerd ;sudo echo /dev/nvme1n1 /var/lib/containerd xfs defaults,noatime 1 2 >> /etc/fstab"
      - "sudo mount -a"

Here is the resulting userdata in the launchtemplate:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=78e7aff85774192583069ede05ed2bd166f9168b5ca780bcb90184ac8c40

--78e7aff85774192583069ede05ed2bd166f9168b5ca780bcb90184ac8c40
Content-Type: text/x-shellscript
Content-Type: charset="us-ascii"

#!/bin/bash

set -o errexit
set -o pipefail
set -o nounset

touch /run/xtables.lock

--78e7aff85774192583069ede05ed2bd166f9168b5ca780bcb90184ac8c40--

How to reproduce it?

Use the following YAML to create a nodegroup for EKS 1.30. Execute command: eksctl create ng -f xxx.yaml

  - name: nodegroup
    instanceType: c6a.large
    minSize: 0
    desiredCapacity: 1
    maxSize: 2
    volumeSize: 30
    volumeType: 'gp3'
    privateNetworking: true
    preBootstrapCommands:
      - "sudo mkfs.xfs /dev/nvme1n1; sudo mkdir -p /var/lib/containerd ;sudo echo /dev/nvme1n1 /var/lib/containerd xfs defaults,noatime 1 2 >> /etc/fstab"
      - "sudo mount -a"
    additionalVolumes:
      - volumeName: '/dev/xvdb' # required
        volumeSize: 50
        volumeType: 'gp3'

Logs
2024-07-29 03:13:13 [ℹ] nodegroup "xxxx-nodegroup" will use "" [AmazonLinux2023/1.30]
2024-07-29 03:13:13 [ℹ] nodegroup "nodegroup" will use "" [AmazonLinux2023/1.30]
2024-07-29 03:13:17 [ℹ] 1 existing nodegroup(s) (xxxx-nodegroup) will be excluded
2024-07-29 03:13:17 [ℹ] 1 nodegroup (nodegroup) was included (based on the include/exclude rules)
2024-07-29 03:13:17 [ℹ] will create a CloudFormation stack for each of 1 managed nodegroups in cluster "xxxx"
2024-07-29 03:13:17 [ℹ]
2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create managed nodegroup "nodegroup" } }
}
2024-07-29 03:13:17 [ℹ] checking cluster stack for missing resources
2024-07-29 03:13:19 [ℹ] cluster stack has all required resources
2024-07-29 03:13:21 [ℹ] building managed nodegroup stack "eksctl-xxxx-nodegroup-nodegroup"
2024-07-29 03:13:22 [ℹ] deploying stack "eksctl-xxxx-nodegroup-nodegroup"
2024-07-29 03:13:22 [ℹ] waiting for CloudFormation stack "eksctl-xxxx-nodegroup-nodegroup"
2024-07-29 03:13:53 [ℹ] waiting for CloudFormation stack "eksctl-xxxx-nodegroup-nodegroup"
2024-07-29 03:14:44 [ℹ] waiting for CloudFormation stack "eksctl-xxxx-nodegroup-nodegroup"
2024-07-29 03:16:22 [ℹ] waiting for CloudFormation stack "eksctl-xxxx-nodegroup-nodegroup"
2024-07-29 03:16:22 [ℹ] no tasks
2024-07-29 03:16:22 [✔] created 0 nodegroup(s) in cluster "xxxx"
2024-07-29 03:16:22 [✔] created 1 managed nodegroup(s) in cluster "xxxx"
2024-07-29 03:16:24 [ℹ] checking security group configuration for all nodegroups
2024-07-29 03:16:24 [ℹ] all nodegroups have up-to-date cloudformation templates

Anything else we need to know?
This is working as expected when I use AL2 AMI in the same cluster.

  - name: nodegroup2
    amiFamily: AmazonLinux2
    instanceType: c6a.large
    minSize: 0
    desiredCapacity: 1
    maxSize: 2
    volumeSize: 30
    volumeType: 'gp3'
    privateNetworking: true
    preBootstrapCommands:
      - "sudo mkfs.xfs /dev/nvme1n1; sudo mkdir -p /var/lib/containerd ;sudo echo /dev/nvme1n1 /var/lib/containerd xfs defaults,noatime 1 2 >> /etc/fstab"
      - "sudo mount -a"
    additionalVolumes:
      - volumeName: '/dev/xvdb' # required
        volumeSize: 50
        volumeType: 'gp3'

Versions

eksctl version: 0.187.0
kubectl version: v1.24.0
OS: linux
cPu1 commented

preBootstrapCommands is not supported for AL2023 nodegroups. This validation exists for self-managed nodegroups but is missing for managed nodegroups, so create nodegroup silently ignores that field rather than failing early with an error. We'll work on a fix soon.

What is the alternative if preBootstrapCommands is not supported for AL2023?

What is the alternative if preBootstrapCommands is not supported for AL2023?

I agree, what should we use instead? The question perhaps should be: Are there any plans to create something more or less equivalent to preBootstrapCommands available in AL2023? This is the one thing that stops us from using AL2023.

we NEED preBootstrapCommands to work because we rely on it to provide custom ca-certificates to pull container images from a private container registry

preBootstrapCommands is not supported for AL2023 nodegroups. This validation exists for self-managed nodegroups but is missing for managed nodegroups, so create nodegroup silently ignores that field rather than failing early with an error. We'll work on a fix soon.

AL2023 is now the default, so please understand this is going to affect a lot of customers without them even realizing it.

@TiberiuGC any update on when something will be supported for AL2023?

AL2023 is now the default, so please understand this is going to affect a lot of customers without them even realizing it.

My take on this is that the most urgent matter is adding a validation for managed nodegroups, so that we don't end up impacting customers in the way described above. We'll likely have a fix for this next week.

As for preBootstrapCommand / overrideBootstrapCommand alternatives for AL2023, I don't have a date to share yet. I'll bump this internally so we can correctly asses where it stands in our backlog of priorities. But I can appreciate there's considerable community interest, I'll make sure to articulate that.

@TiberiuGC Just ran into this issue myself and burned a few hours troubleshooting. I use preBootstrapCommands to inject HTTP proxy env vars and this is a must have for working in a locked down corporate environment.

A warning message with instructions to fallback to Amazon Linux 2 would be helpful, but this is really a showstopper for enterprise customers. I simply can't use AL2023 without injecting HTTP proxy settings.

Also tell management this disproportionally impacts enterprise customers who have fat budgets and are looking to spin up massive instances to run their internal apps that maybe a handful of people actually use and then turn around and forget they're running...forever. So much compute billing...

Happy to help however I can. Where would one start if they're interested in injecting preBootstrapCommands in AL2023?

cPu1 commented

@jonathanfoster, we are working on adding support for preBootstrapCommands in AL2023. Please stay tuned.

Any ETA ?

+1 🆙