Bottlerocket GPU deployment issue after updated EKS module from 19.21.0 to 20.31.6
Opened this issue · 1 comments
Description
I am trying to upgrade from 19.21.0 to 20.31.6. In the version 19.21.0 I was able to deploy the below managed node groups with Bottlerocket AMIs and have both the generic CPU and GPU nodes join the cluster. Now with the transition to version 20 the generic CPU nodes join the cluster just fine but the GPU nodes never join even though I'm using the same block of code for the user data as in version 19.21.0. I also am unable to connect via SSM into the GPU nodes to further troubleshoot even though they have the same IAM role attached as the CPU nodes.
My EKS version is 1.31 and the AMI release versions for Bottlerocket is 1.29.0-c55d099c
Here is the first part of the module call -
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "20.31.6"
version = "19.21.0"
cluster_name = local.cluster_name
cluster_version = local.env.cluster.cluster_version
cluster_endpoint_public_access = true
cluster_timeouts = {
create = "2h" # Timeout for creating the EKS cluster
update = "2h" # Timeout for updating the EKS cluster
delete = "2h" # Timeout for deleting the EKS cluster
}
authentication_mode = "API_AND_CONFIG_MAP"
enable_cluster_creator_admin_permissions = true
Jumping down to the managed groups:
eks_managed_node_groups = {
# bigbang Generic EKS Managed Node Groups
bigbang_generic = {
name = "${local.env.name}-agent-${local.random_name_suffix}"
use_name_prefix = true
subnet_ids = local.env.vpc.private_subnet_ids
# subnet_ids = local.env.vpc.public_subnet_ids
instance_types = [local.env.cluster.agent.type]
# Change the below to deploy either Amazon EKS optimized AMI or Bottlerocket AMI - reference the data.tf file for values
# ami_id = data.aws_ami.eks_default.image_id
ami_type = "BOTTLEROCKET_x86_64"
# ami_id = data.aws_ami.eks_default_bottlerocket.image_id
min_size = local.env.cluster.agent.replicas.min
desired_size = local.env.cluster.agent.replicas.desired
max_size = local.env.cluster.agent.replicas.max
# Must set to false when using Bottlerocket OS AMI for EKS nodes.
enable_bootstrap_user_data = false
# When using bottlerocket, the supplied user data (TOML format) is merged in with the values supplied by EKS. Therefore, pre_bootstrap_user_data and post_bootstrap_user_data are not valid since the bottlerocket OS handles when various settings are applied.
bootstrap_extra_args = <<-EOT
[settings.host-containers.admin]
enabled = true
[settings.host-containers.control]
enabled = true
[settings.kernel]
lockdown = "integrity"
[settings.kubernetes.node-labels]
"bottlerocket.aws/updater-interface-version" = "2.0.0"
[settings.kubernetes]
cluster-name = "${module.eks.cluster_name}"
api-server = "${module.eks.cluster_endpoint}"
cluster-certificate = "${module.eks.cluster_certificate_authority_data}"
EOT
# Set this to true if you want to cluster to roll to new nodes when AWS releases updated EKS node images
force_update_version = true
labels = {
"bottlerocket.aws/updater-interface-version" = "2.0.0"
GithubRepo = "terraform-aws-eks"
GithubOrg = "terraform-aws-modules"
}
update_config = {
max_unavailable_percentage = 33 # or set `max_unavailable`
}
description = "EKS managed node group example launch template"
ebs_optimized = true
disable_api_termination = false
enable_monitoring = true
# This is for the AWS EKS Optimized AMI image
# block_device_mappings = {
# xvda = {
# device_name = "/dev/xvda"
# ebs = {
# volume_size = 500
# volume_type = "gp3"
# iops = 3000
# throughput = 150
# encrypted = true
# kms_key_id = module.ebs_kms_key.key_arn
# delete_on_termination = true
# }
# }
# }
# This is for Bottlerocket CPU AMI
block_device_mappings = {
xvda = {
device_name = "/dev/xvda"
ebs = {
volume_size = 2
volume_type = "gp3"
iops = 3000
throughput = 150
encrypted = true
kms_key_id = module.ebs_kms_key.key_arn
delete_on_termination = true
}
}
xvdb = {
device_name = "/dev/xvdb"
ebs = {
volume_size = 20
volume_type = "gp3"
iops = 3000
throughput = 150
encrypted = true
kms_key_id = module.ebs_kms_key.key_arn
delete_on_termination = true
}
}
}
metadata_options = {
http_endpoint = "enabled"
http_tokens = "required"
http_put_response_hop_limit = 2
instance_metadata_tags = "disabled"
}
create_iam_role = true
iam_role_name = "bigbang-eks-managed-node-group"
iam_role_use_name_prefix = false
iam_role_description = "EKS managed node group for bigbang role"
iam_role_tags = {
Purpose = "Protector of the kubelet"
}
iam_role_additional_policies = {
AmazonEC2ContainerRegistryReadOnly = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
additional = aws_iam_policy.node_additional.arn
AmazonEc2FullAccess = "arn:aws:iam::aws:policy/AmazonEC2FullAccess"
CloudWatchLogsFullAccess = "arn:aws:iam::aws:policy/CloudWatchLogsFullAccess"
SecretsManagerReadWrite = "arn:aws:iam::aws:policy/SecretsManagerReadWrite"
AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
},
bigbang_gpu = {
name = "${local.env.name}-agent-gpu-${local.random_name_suffix}"
use_name_prefix = true
subnet_ids = local.env.vpc.private_subnet_ids
# subnet_ids = local.env.vpc.public_subnet_ids
instance_types = [local.env.cluster.agent.gpu_type]
# Change the below to deploy either Amazon EKS optimized AMI or Bottlerocket AMI - reference the data.tf file for values
# ami_id = data.aws_ami.eks_default_gpu.image_id
ami_type = "BOTTLEROCKET_x86_64_NVIDIA"
# ami_id = data.aws_ami.eks_default_bottlerocket_gpu.image_id
min_size = local.env.cluster.agent.gpu_replicas.min
desired_size = local.env.cluster.agent.gpu_replicas.desired
max_size = local.env.cluster.agent.gpu_replicas.max
# Must set to false when using Bottlerocket OS AMI for EKS nodes.
enable_bootstrap_user_data = false
# When using bottlerocket, the supplied user data (TOML format) is merged in with the values supplied by EKS. Therefore, pre_bootstrap_user_data and post_bootstrap_user_data are not valid since the bottlerocket OS handles when various settings are applied.
bootstrap_extra_args = <<-EOT
[settings.host-containers.admin]
enabled = true
[settings.host-containers.control]
enabled = true
[settings.kernel]
lockdown = "integrity"
[settings.kubernetes.node-labels]
"bottlerocket.aws/updater-interface-version" = "2.0.0"
[settings.kubernetes]
cluster-name = "${module.eks.cluster_name}"
api-server = "${module.eks.cluster_endpoint}"
cluster-certificate = "${module.eks.cluster_certificate_authority_data}"
EOT
# Set this to true if you want to cluster to roll to new nodes when AWS releases updated EKS node images
force_update_version = true
labels = {
"bottlerocket.aws/updater-interface-version" = "2.0.0"
GithubRepo = "terraform-aws-eks"
GithubOrg = "terraform-aws-modules"
nodePool = "gpu"
}
taints : [
{
key : "dedicated",
value : "gpuGroup",
effect : "NO_SCHEDULE"
}
]
#update_config = {
# max_unavailable_percentage = 33 # or set `max_unavailable`
#}
description = "EKS managed node group example launch template"
ebs_optimized = true
disable_api_termination = false
enable_monitoring = true
# This is for the AWS EKS Optimized AMI image
# block_device_mappings = {
# xvda = {
# device_name = "/dev/xvda"
# ebs = {
# volume_size = 200
# volume_type = "gp3"
# iops = 3000
# throughput = 150
# encrypted = true
# kms_key_id = module.ebs_kms_key.key_arn
# delete_on_termination = true
# }
# }
# }
# This is for Bottlerocket GPU AMI
block_device_mappings = {
xvda = {
device_name = "/dev/xvda"
ebs = {
volume_size = 4
volume_type = "gp3"
iops = 3000
throughput = 150
encrypted = true
kms_key_id = module.ebs_kms_key.key_arn
delete_on_termination = true
}
}
xvdb = {
device_name = "/dev/xvdb"
ebs = {
volume_size = 18
volume_type = "gp3"
iops = 3000
throughput = 150
encrypted = true
kms_key_id = module.ebs_kms_key.key_arn
delete_on_termination = true
}
}
}
metadata_options = {
http_endpoint = "enabled"
http_tokens = "required"
http_put_response_hop_limit = 2
instance_metadata_tags = "disabled"
}
create_iam_role = false
iam_role_arn = "arn:aws:iam::971870020263:role/bigbang-eks-managed-node-group"
tags = local.env.tags
}
}
tags = local.env.tags
depends_on = [module.vpc, module.elb, module.elb_passthrough]
}
This is the error received when upgrading from v19 to 20
Error: waiting for EKS Node Group (bigbang-development-28i:bigbang-development-agent-gpu-28i-20241224115215482100000029) version update (fc5a4074-f281-317e-a851-472bceffb830): unexpected state 'Failed', wanted target 'Successful'. last error: : NodeCreationFailure: Couldn't proceed with upgrade process as new nodes are not joining node group bigbang-development-agent-gpu-28i-20241224115215482100000029
│
│ with module.eks.module.eks_managed_node_group["bigbang_gpu"].aws_eks_node_group.this[0],
│ on .terraform\modules\eks\modules\eks-managed-node-group\main.tf line 392, in resource "aws_eks_node_group" "this":
│ 392: resource "aws_eks_node_group" "this" {
│
Here is the user data that is passed to Bottlerocket CPU in version 19
[settings.kubernetes]
"cluster-name" = "bigbang-development-28i"
"api-server" = "https://D17B803059777D9F62BD52A5EE8416E0.gr7.us-east-1.eks.amazonaws.com"
"cluster-certificate" = "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURCVENDQWUyZ0F3SUJBZ0lJYXVTQWg2elFzb3d3RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TkRFeU1qUXhNVFF3TXpSYUZ3MHpOREV5TWpJeE1UUTFNelJhTUJVeApFekFSQmdOVkJBTVRDbXQxWW1WeWJtVjBaWE13Z2dFaU1BMEdDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLCkFvSUJBUURNSGdLbkphOWd6Snlqak1XUzRZZ3NRRmh2Z2dCQVRrYjJTNnY2K0dpWGxpczZ4UDhvUE5VN2hDSDYKT2doUnlrUjI0VGVCQkQvQnRBODJ0NXYxMzBGdTBBU1RPcmhNT0M2NDh0TzNodFhJdytaa1pmeVFIQ2JlYWx3awoxWnZTbHAxVlRLeG5YWmZ5QVpEK2lPdEhjci9HbEEzc1hkcnBVSVdrSkN2NGFQZVZPQnZSVm8wbmw5dmx3RzlsClZhc1ZaVFhIcWlKMjVEWitoV2U2emNydm9KdFFCdVVjZTB1OUd3MXBReUc5eEFUNjdZSElhRTBNZkdLelMwdGoKMklPb2s3eHNpbUZMRkxjKzUwS09UY25pRkJpOWMyL1IzMGlJanJhd3RyL2F5S1NsNnA5MWh1ek5XK2E4YVZYRgpqbGJlR3dXSVhZYjVUTy9KczBLdXQ1V1ZxM2J4QWdNQkFBR2pXVEJYTUE0R0ExVWREd0VCL3dRRUF3SUNwREFQCkJnTlZIUk1CQWY4RUJUQURBUUgvTUIwR0ExVWREZ1FXQkJRQS82aGJUaVpXZ3VKcHBuU3BQWXltWFArVFl6QVYKQmdOVkhSRUVEakFNZ2dwcmRXSmxjbTVsZEdWek1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQ3psajNnM0RpaAphbVh0RlcwejhORFdxN2dnN1ltU2hoRW5INUpaNThuSVZ0Wk1DaHBLNFdBR3krQzZwSVVIYlQ4YUFydGYxSmk0CkxiSTFiZys4WitLMmpmYTYvWmtBeWFyS2gvK25PV0tTKzVvN3lXUnBaSVpBSHJYb1lPUk9aOTZGaHZmaTJ1dFcKZGdaNHBNT1NZcUdIaStmNXhxMlBiYnlPUkU5R3F5R0p0UmI1aEw1aGNyNnRST1dVaHl1UE1BS01TMjNPbzVCMAo0dnFTcytseGxORkRJa2Z5T3IxaitLRWpCYUhIVWNZMjQ2cTExSnIxR3oyQTF6ZXAvNmgrTVBlYzMydWNFeVZiCis3TTlMbFlWMVpRNDFIWUhKdi9tNit4N1hOTzRDMmoxSGlrNXo1dm9tU2xMUUU2OG1HcGZ3Tm8zVGh6Ti9ubXkKY0pZNUllQStRUmxlCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K"
"cluster-dns-ip" = "172.20.0.10"
"max-pods" = 58
[settings.kubernetes.node-labels]
"eks.amazonaws.com/sourceLaunchTemplateVersion" = "1"
"bottlerocket.aws/updater-interface-version" = "2.0.0"
"GithubRepo" = "terraform-aws-eks"
"eks.amazonaws.com/nodegroup-image" = "ami-08e8202f7551d19fb"
"eks.amazonaws.com/capacityType" = "ON_DEMAND"
"eks.amazonaws.com/nodegroup" = "bigbang-development-agent-28i-20241224115215482100000027"
"eks.amazonaws.com/sourceLaunchTemplateId" = "lt-0a5a16bb240eab278"
"GithubOrg" = "terraform-aws-modules"
Here is the user data that is passed to the GPU instance in version 19
[settings.kubernetes]
"cluster-name" = "bigbang-development-28i"
"api-server" = "https://D17B803059777D9F62BD52A5EE8416E0.gr7.us-east-1.eks.amazonaws.com"
"cluster-certificate" = "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURCVENDQWUyZ0F3SUJBZ0lJYXVTQWg2elFzb3d3RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TkRFeU1qUXhNVFF3TXpSYUZ3MHpOREV5TWpJeE1UUTFNelJhTUJVeApFekFSQmdOVkJBTVRDbXQxWW1WeWJtVjBaWE13Z2dFaU1BMEdDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLCkFvSUJBUURNSGdLbkphOWd6Snlqak1XUzRZZ3NRRmh2Z2dCQVRrYjJTNnY2K0dpWGxpczZ4UDhvUE5VN2hDSDYKT2doUnlrUjI0VGVCQkQvQnRBODJ0NXYxMzBGdTBBU1RPcmhNT0M2NDh0TzNodFhJdytaa1pmeVFIQ2JlYWx3awoxWnZTbHAxVlRLeG5YWmZ5QVpEK2lPdEhjci9HbEEzc1hkcnBVSVdrSkN2NGFQZVZPQnZSVm8wbmw5dmx3RzlsClZhc1ZaVFhIcWlKMjVEWitoV2U2emNydm9KdFFCdVVjZTB1OUd3MXBReUc5eEFUNjdZSElhRTBNZkdLelMwdGoKMklPb2s3eHNpbUZMRkxjKzUwS09UY25pRkJpOWMyL1IzMGlJanJhd3RyL2F5S1NsNnA5MWh1ek5XK2E4YVZYRgpqbGJlR3dXSVhZYjVUTy9KczBLdXQ1V1ZxM2J4QWdNQkFBR2pXVEJYTUE0R0ExVWREd0VCL3dRRUF3SUNwREFQCkJnTlZIUk1CQWY4RUJUQURBUUgvTUIwR0ExVWREZ1FXQkJRQS82aGJUaVpXZ3VKcHBuU3BQWXltWFArVFl6QVYKQmdOVkhSRUVEakFNZ2dwcmRXSmxjbTVsZEdWek1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQ3psajNnM0RpaAphbVh0RlcwejhORFdxN2dnN1ltU2hoRW5INUpaNThuSVZ0Wk1DaHBLNFdBR3krQzZwSVVIYlQ4YUFydGYxSmk0CkxiSTFiZys4WitLMmpmYTYvWmtBeWFyS2gvK25PV0tTKzVvN3lXUnBaSVpBSHJYb1lPUk9aOTZGaHZmaTJ1dFcKZGdaNHBNT1NZcUdIaStmNXhxMlBiYnlPUkU5R3F5R0p0UmI1aEw1aGNyNnRST1dVaHl1UE1BS01TMjNPbzVCMAo0dnFTcytseGxORkRJa2Z5T3IxaitLRWpCYUhIVWNZMjQ2cTExSnIxR3oyQTF6ZXAvNmgrTVBlYzMydWNFeVZiCis3TTlMbFlWMVpRNDFIWUhKdi9tNit4N1hOTzRDMmoxSGlrNXo1dm9tU2xMUUU2OG1HcGZ3Tm8zVGh6Ti9ubXkKY0pZNUllQStRUmxlCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K"
"cluster-dns-ip" = "172.20.0.10"
"max-pods" = 58
[settings.kubernetes.node-labels]
"nodePool" = "gpu"
"eks.amazonaws.com/sourceLaunchTemplateVersion" = "1"
"bottlerocket.aws/updater-interface-version" = "2.0.0"
"GithubRepo" = "terraform-aws-eks"
"eks.amazonaws.com/nodegroup-image" = "ami-016501e3b19da26b2"
"eks.amazonaws.com/capacityType" = "ON_DEMAND"
"eks.amazonaws.com/nodegroup" = "bigbang-development-agent-gpu-28i-20241224115215482100000029"
"eks.amazonaws.com/sourceLaunchTemplateId" = "lt-0645b7457ca252e23"
"GithubOrg" = "terraform-aws-modules"
[settings.kubernetes.node-taints]
"dedicated" = "gpuGroup:NoSchedule"
Now after the upgrade to version 20 this is what it looks like for the CPU nodes
settings.kubernetes.cluster-name = 'bigbang-development-28i'
settings.kubernetes.api-server = 'https://D17B803059777D9F62BD52A5EE8416E0.gr7.us-east-1.eks.amazonaws.com'
settings.kubernetes.cluster-certificate = 'LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURCVENDQWUyZ0F3SUJBZ0lJYXVTQWg2elFzb3d3RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TkRFeU1qUXhNVFF3TXpSYUZ3MHpOREV5TWpJeE1UUTFNelJhTUJVeApFekFSQmdOVkJBTVRDbXQxWW1WeWJtVjBaWE13Z2dFaU1BMEdDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLCkFvSUJBUURNSGdLbkphOWd6Snlqak1XUzRZZ3NRRmh2Z2dCQVRrYjJTNnY2K0dpWGxpczZ4UDhvUE5VN2hDSDYKT2doUnlrUjI0VGVCQkQvQnRBODJ0NXYxMzBGdTBBU1RPcmhNT0M2NDh0TzNodFhJdytaa1pmeVFIQ2JlYWx3awoxWnZTbHAxVlRLeG5YWmZ5QVpEK2lPdEhjci9HbEEzc1hkcnBVSVdrSkN2NGFQZVZPQnZSVm8wbmw5dmx3RzlsClZhc1ZaVFhIcWlKMjVEWitoV2U2emNydm9KdFFCdVVjZTB1OUd3MXBReUc5eEFUNjdZSElhRTBNZkdLelMwdGoKMklPb2s3eHNpbUZMRkxjKzUwS09UY25pRkJpOWMyL1IzMGlJanJhd3RyL2F5S1NsNnA5MWh1ek5XK2E4YVZYRgpqbGJlR3dXSVhZYjVUTy9KczBLdXQ1V1ZxM2J4QWdNQkFBR2pXVEJYTUE0R0ExVWREd0VCL3dRRUF3SUNwREFQCkJnTlZIUk1CQWY4RUJUQURBUUgvTUIwR0ExVWREZ1FXQkJRQS82aGJUaVpXZ3VKcHBuU3BQWXltWFArVFl6QVYKQmdOVkhSRUVEakFNZ2dwcmRXSmxjbTVsZEdWek1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQ3psajNnM0RpaAphbVh0RlcwejhORFdxN2dnN1ltU2hoRW5INUpaNThuSVZ0Wk1DaHBLNFdBR3krQzZwSVVIYlQ4YUFydGYxSmk0CkxiSTFiZys4WitLMmpmYTYvWmtBeWFyS2gvK25PV0tTKzVvN3lXUnBaSVpBSHJYb1lPUk9aOTZGaHZmaTJ1dFcKZGdaNHBNT1NZcUdIaStmNXhxMlBiYnlPUkU5R3F5R0p0UmI1aEw1aGNyNnRST1dVaHl1UE1BS01TMjNPbzVCMAo0dnFTcytseGxORkRJa2Z5T3IxaitLRWpCYUhIVWNZMjQ2cTExSnIxR3oyQTF6ZXAvNmgrTVBlYzMydWNFeVZiCis3TTlMbFlWMVpRNDFIWUhKdi9tNit4N1hOTzRDMmoxSGlrNXo1dm9tU2xMUUU2OG1HcGZ3Tm8zVGh6Ti9ubXkKY0pZNUllQStRUmxlCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K'
settings.kubernetes.cluster-dns-ip = '172.20.0.10'
settings.kubernetes.max-pods = 58
settings.kubernetes.node-labels.'eks.amazonaws.com/sourceLaunchTemplateVersion' = '2'
settings.kubernetes.node-labels.'bottlerocket.aws/updater-interface-version' = '2.0.0'
settings.kubernetes.node-labels.GithubRepo = 'terraform-aws-eks'
settings.kubernetes.node-labels.'eks.amazonaws.com/nodegroup-image' = 'ami-08e8202f7551d19fb'
settings.kubernetes.node-labels.'eks.amazonaws.com/capacityType' = 'ON_DEMAND'
settings.kubernetes.node-labels.'eks.amazonaws.com/nodegroup' = 'bigbang-development-agent-28i-20241224115215482100000027'
settings.kubernetes.node-labels.'eks.amazonaws.com/sourceLaunchTemplateId' = 'lt-0a5a16bb240eab278'
settings.kubernetes.node-labels.GithubOrg = 'terraform-aws-modules'
settings.host-containers.admin.enabled = true
settings.host-containers.control.enabled = true
settings.kernel.lockdown = 'integrity'
For the GPU instances - I had to delete the managed group from the EKS console and rerun terraform to get them to build
settings.kubernetes.cluster-name = 'bigbang-development-28i'
settings.kubernetes.api-server = 'https://D17B803059777D9F62BD52A5EE8416E0.gr7.us-east-1.eks.amazonaws.com'
settings.kubernetes.cluster-certificate = 'LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURCVENDQWUyZ0F3SUJBZ0lJYXVTQWg2elFzb3d3RFFZSktvWklodmNOQVFFTEJRQXdGVEVUTUJFR0ExVUUKQXhNS2EzVmlaWEp1WlhSbGN6QWVGdzB5TkRFeU1qUXhNVFF3TXpSYUZ3MHpOREV5TWpJeE1UUTFNelJhTUJVeApFekFSQmdOVkJBTVRDbXQxWW1WeWJtVjBaWE13Z2dFaU1BMEdDU3FHU0liM0RRRUJBUVVBQTRJQkR3QXdnZ0VLCkFvSUJBUURNSGdLbkphOWd6Snlqak1XUzRZZ3NRRmh2Z2dCQVRrYjJTNnY2K0dpWGxpczZ4UDhvUE5VN2hDSDYKT2doUnlrUjI0VGVCQkQvQnRBODJ0NXYxMzBGdTBBU1RPcmhNT0M2NDh0TzNodFhJdytaa1pmeVFIQ2JlYWx3awoxWnZTbHAxVlRLeG5YWmZ5QVpEK2lPdEhjci9HbEEzc1hkcnBVSVdrSkN2NGFQZVZPQnZSVm8wbmw5dmx3RzlsClZhc1ZaVFhIcWlKMjVEWitoV2U2emNydm9KdFFCdVVjZTB1OUd3MXBReUc5eEFUNjdZSElhRTBNZkdLelMwdGoKMklPb2s3eHNpbUZMRkxjKzUwS09UY25pRkJpOWMyL1IzMGlJanJhd3RyL2F5S1NsNnA5MWh1ek5XK2E4YVZYRgpqbGJlR3dXSVhZYjVUTy9KczBLdXQ1V1ZxM2J4QWdNQkFBR2pXVEJYTUE0R0ExVWREd0VCL3dRRUF3SUNwREFQCkJnTlZIUk1CQWY4RUJUQURBUUgvTUIwR0ExVWREZ1FXQkJRQS82aGJUaVpXZ3VKcHBuU3BQWXltWFArVFl6QVYKQmdOVkhSRUVEakFNZ2dwcmRXSmxjbTVsZEdWek1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQkFRQ3psajNnM0RpaAphbVh0RlcwejhORFdxN2dnN1ltU2hoRW5INUpaNThuSVZ0Wk1DaHBLNFdBR3krQzZwSVVIYlQ4YUFydGYxSmk0CkxiSTFiZys4WitLMmpmYTYvWmtBeWFyS2gvK25PV0tTKzVvN3lXUnBaSVpBSHJYb1lPUk9aOTZGaHZmaTJ1dFcKZGdaNHBNT1NZcUdIaStmNXhxMlBiYnlPUkU5R3F5R0p0UmI1aEw1aGNyNnRST1dVaHl1UE1BS01TMjNPbzVCMAo0dnFTcytseGxORkRJa2Z5T3IxaitLRWpCYUhIVWNZMjQ2cTExSnIxR3oyQTF6ZXAvNmgrTVBlYzMydWNFeVZiCis3TTlMbFlWMVpRNDFIWUhKdi9tNit4N1hOTzRDMmoxSGlrNXo1dm9tU2xMUUU2OG1HcGZ3Tm8zVGh6Ti9ubXkKY0pZNUllQStRUmxlCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K'
settings.kubernetes.cluster-dns-ip = '172.20.0.10'
settings.kubernetes.max-pods = 58
settings.kubernetes.node-labels.nodePool = 'gpu'
settings.kubernetes.node-labels.'eks.amazonaws.com/sourceLaunchTemplateVersion' = '2'
settings.kubernetes.node-labels.'bottlerocket.aws/updater-interface-version' = '2.0.0'
settings.kubernetes.node-labels.GithubRepo = 'terraform-aws-eks'
settings.kubernetes.node-labels.'eks.amazonaws.com/nodegroup-image' = 'ami-016501e3b19da26b2'
settings.kubernetes.node-labels.'eks.amazonaws.com/capacityType' = 'ON_DEMAND'
settings.kubernetes.node-labels.'eks.amazonaws.com/nodegroup' = 'bigbang-development-agent-gpu-28i-20241224141205476300000001'
settings.kubernetes.node-labels.'eks.amazonaws.com/sourceLaunchTemplateId' = 'lt-0645b7457ca252e23'
settings.kubernetes.node-labels.GithubOrg = 'terraform-aws-modules'
settings.kubernetes.node-taints.dedicated = 'gpuGroup:NoSchedule'
settings.host-containers.admin.enabled = true
settings.host-containers.control.enabled = true
settings.kernel.lockdown = 'integrity'
Same AMI ID across the EKS module versions just the GPU on version 20 will not join the cluster
CPU AMI ID - bottlerocket-aws-k8s-1.31-x86_64-v1.29.0-c55d099c
GPU AMI ID - bottlerocket-aws-k8s-1.31-nvidia-x86_64-v1.29.0-c55d099c