No additional ENIs are attached after prefix delegation

Question

No additional ENIs are attached after prefix delegation

SeungsuKim opened this issue 4 months ago · 6 comments

What happened:
I've enabled prefix delegation to increase number of IP addresses assignable to my m6i.2xlarge node from 58 to 110. There's a node which have to run 65 pods. The node has one ENI with three /28 prefixes, which can afford 48 IP addresses. Since it is not enough, a new ENI should be attached to the node. However, no additional ENIs are attached to the node. Remaining pods are in Pending status with following events:

Type     Reason                  Age                  From     Message
----     ------                  ----                 ----     -------
Warning  FailedCreatePodSandBox  60m                  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "661d66f608b5b568c7d4e2e3eb9a2f8b158e3b7dac1ab787542452c00bffb1b5": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container

There's enough remaining IP addresses in the subnet.
There's no error log on aws-node pods.
The WARM_PREFIX_TARGET is set to 1.
IAM role with AmazonEKS_CNI_Policy is set for the VPC CNI addon.

Attach logs

What you expected to happen:

New ENI is attached to the node with prefixes, so more IP addresses can be allocated to the node.

How to reproduce it (as minimally and precisely as possible):

Create an EKS cluster.
Install VPC CNI addon with version v1.16.4-eksbuild.2(latest). Enable prefix delegation with following additional configuration.

{"env":{"ENABLE_PREFIX_DELEGATION":"true","WARM_PREFIX_TARGET":"1"}}

Create a IAM role with AmazonEKS_CNI_Policy attached. Set trust relationship as follows, so aws-node service account can use the IAM role.

{
  "Version": "2012-10-17",
  "Statement": [
      {
          "Sid": "",
          "Effect": "Allow",
          "Principal": {
              "Federated": "arn:aws:iam::****:oidc-provider/oidc.eks.ap-northeast-2.amazonaws.com/id/6E08DB7F6D1422458CAD446369C0F4BF"
          },
          "Action": "sts:AssumeRoleWithWebIdentity",
          "Condition": {
              "StringEquals": {
                  "oidc.eks.ap-northeast-2.amazonaws.com/id/6E08DB7F6D1422458CAD446369C0F4BF:sub": "system:serviceaccount:kube-system:aws-node",
                  "oidc.eks.ap-northeast-2.amazonaws.com/id/6E08DB7F6D1422458CAD446369C0F4BF:aud": "sts.amazonaws.com"
              }
          }
      }
  ]
}

Provision pods until a node cannot assign a new IP address to the pod.

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version): v1.28.5-eks-5e0fdde
CNI Version: v1.16.4-eksbuild.2
OS (e.g: cat /etc/os-release): MacOS 13.4 (22F66)
Kernel (e.g. uname -a): Darwin seungsukim.local 22.5.0 Darwin Kernel Version 22.5.0: Mon Apr 24 20:52:24 PDT 2023; root:xnu-8796.121.2~5/RELEASE_ARM64_T6000 arm64

Answer 1 · 2024-03-13T21:49:46.000Z

Hi! Hope you are doing well. A few things come to mind.

1/ What subnet size are you using? Prefix delegation requires contiguous /28 blocks. The failure of the ENI creation could be due to the subnet being out of contiguous /28 blocks.

2/ Do you see any InsufficientCidrBlocks or InsufficientFreeAddressesInSubnet errors in the logs?

3/ Do you have max-pods configured to a lower amount?

4/ Are you using a managed node group or self-managed node group? If using a self-managed node group how are the CIDRs reserved?

Answer 2 · 2024-03-14T01:16:50.000Z

@jchen6585

My subnet size is /24. There are 120 and 180 available IPv4 addresses in my two subnets.
Which error log do you mean? There are no error log in aws-node pod.
max-pods is configured to 110. I'm using managed node group. After enabling CNI prefix delegation and create new node group, max-pods value automatically changed from 58 to 110.
I'm using managed node group.

Answer 3 · 2024-03-14T17:01:37.000Z

@SeungsuKim
2/ If you access the node whether through SSH or SSM, you can find log files at /var/log/aws-routed-eni/. Here you can find logs regarding VPC-CNI. Specifically, you would want to look at the ipamd.log file as that is the component responsible for allocating IPs.

Answer 4 · 2024-04-04T20:16:59.000Z

@SeungsuKim , could you share the ipamd logs as indicated if you are still facing this issue. You can follow this troubleshooting doc - https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md and send the logs to 'k8s-awscni-triage@amazon.com' for us to investigate.

Answer 5 · 2024-05-01T20:28:47.000Z

Closing pending the request on logs. Please reopen if you run into this again, and share the logs as indicated in the troubleshooting guide.

Answer 6 · 2024-05-01T20:33:49.000Z

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.