aws/amazon-vpc-cni-k8s

No additional ENIs are attached after prefix delegation

SeungsuKim opened this issue · 6 comments

What happened:
I've enabled prefix delegation to increase number of IP addresses assignable to my m6i.2xlarge node from 58 to 110. There's a node which have to run 65 pods. The node has one ENI with three /28 prefixes, which can afford 48 IP addresses. Since it is not enough, a new ENI should be attached to the node. However, no additional ENIs are attached to the node. Remaining pods are in Pending status with following events:

Type     Reason                  Age                  From     Message
----     ------                  ----                 ----     -------
Warning  FailedCreatePodSandBox  60m                  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "661d66f608b5b568c7d4e2e3eb9a2f8b158e3b7dac1ab787542452c00bffb1b5": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
  • There's enough remaining IP addresses in the subnet.
  • There's no error log on aws-node pods.
  • The WARM_PREFIX_TARGET is set to 1.
  • IAM role with AmazonEKS_CNI_Policy is set for the VPC CNI addon.

Attach logs

What you expected to happen:

New ENI is attached to the node with prefixes, so more IP addresses can be allocated to the node.

How to reproduce it (as minimally and precisely as possible):

  1. Create an EKS cluster.
  2. Install VPC CNI addon with version v1.16.4-eksbuild.2(latest). Enable prefix delegation with following additional configuration.
{"env":{"ENABLE_PREFIX_DELEGATION":"true","WARM_PREFIX_TARGET":"1"}}
  1. Create a IAM role with AmazonEKS_CNI_Policy attached. Set trust relationship as follows, so aws-node service account can use the IAM role.
{
  "Version": "2012-10-17",
  "Statement": [
      {
          "Sid": "",
          "Effect": "Allow",
          "Principal": {
              "Federated": "arn:aws:iam::****:oidc-provider/oidc.eks.ap-northeast-2.amazonaws.com/id/6E08DB7F6D1422458CAD446369C0F4BF"
          },
          "Action": "sts:AssumeRoleWithWebIdentity",
          "Condition": {
              "StringEquals": {
                  "oidc.eks.ap-northeast-2.amazonaws.com/id/6E08DB7F6D1422458CAD446369C0F4BF:sub": "system:serviceaccount:kube-system:aws-node",
                  "oidc.eks.ap-northeast-2.amazonaws.com/id/6E08DB7F6D1422458CAD446369C0F4BF:aud": "sts.amazonaws.com"
              }
          }
      }
  ]
}
  1. Provision pods until a node cannot assign a new IP address to the pod.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.28.5-eks-5e0fdde
  • CNI Version: v1.16.4-eksbuild.2
  • OS (e.g: cat /etc/os-release): MacOS 13.4 (22F66)
  • Kernel (e.g. uname -a): Darwin seungsukim.local 22.5.0 Darwin Kernel Version 22.5.0: Mon Apr 24 20:52:24 PDT 2023; root:xnu-8796.121.2~5/RELEASE_ARM64_T6000 arm64

Hi! Hope you are doing well. A few things come to mind.

1/ What subnet size are you using? Prefix delegation requires contiguous /28 blocks. The failure of the ENI creation could be due to the subnet being out of contiguous /28 blocks.

2/ Do you see any InsufficientCidrBlocks or InsufficientFreeAddressesInSubnet errors in the logs?

3/ Do you have max-pods configured to a lower amount?

4/ Are you using a managed node group or self-managed node group? If using a self-managed node group how are the CIDRs reserved?

@jchen6585

  1. My subnet size is /24. There are 120 and 180 available IPv4 addresses in my two subnets.
  2. Which error log do you mean? There are no error log in aws-node pod.
  3. max-pods is configured to 110. I'm using managed node group. After enabling CNI prefix delegation and create new node group, max-pods value automatically changed from 58 to 110.
  4. I'm using managed node group.

@SeungsuKim
2/ If you access the node whether through SSH or SSM, you can find log files at /var/log/aws-routed-eni/. Here you can find logs regarding VPC-CNI. Specifically, you would want to look at the ipamd.log file as that is the component responsible for allocating IPs.

@SeungsuKim , could you share the ipamd logs as indicated if you are still facing this issue. You can follow this troubleshooting doc - https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/troubleshooting.md and send the logs to 'k8s-awscni-triage@amazon.com' for us to investigate.

Closing pending the request on logs. Please reopen if you run into this again, and share the logs as indicated in the troubleshooting guide.

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.