aws/amazon-vpc-cni-k8s

CNI Failed to assign IPs while using WARM_IP_TARGET & MINIMUM_IP_TARGET

sidewinder12s opened this issue · 6 comments

What happened:

We have run into IP allocation issues in many of our clusters. Due to some technical constraints, we have a cluster that is constrained most of its nodes to a single subnet. We realized we had a few hundred IPs wasted on these nodes due to the workload/scheduling constraints (largely 1 to 1 mapping of node to workload pod), so we tried modifying our IP allocation settings to reduce that waste.

Original:

    WARM_ENI_TARGET: "null"
    WARM_IP_TARGET: "5"
    MINIMUM_IP_TARGET: "10"

New:

    WARM_ENI_TARGET: "null"
    WARM_IP_TARGET: "0"
    MINIMUM_IP_TARGET: "11"

After applying these settings, the majority of workloads/nodes appeared to function as expected with no IP allocation issues. We however realized we a few longer lived nodes had run out of IPs and were failing to assign more IPs, even though the subnet had plenty of slack capacity.

We largely had issues on nodes with more than 11 pods that required IPs. It seems like there might be some unhandled condition if you keep the warm IP pool empty?

Once I raised the WARM_IP_TARGET to 1, all pods that were failing to assign IPs appeared to recover.

Attach logs

Logs sent in support case: 14209321561

What you expected to happen:

We would be able to assign IPs

How to reproduce it (as minimally and precisely as possible):

I'd assume just take our original settings:

  • apply them
  • Launch pods
  • Apply new settings

I think what might trigger it is if you have a minimum IP target that does not meet all pod requirements and then have another pod get scheduled.

Anything else we need to know?:

We are aware this configuration is not recommended and have EC2 API Metrics enabled to observe for throttling.

Environment:

  • Kubernetes version (use kubectl version): 1.25
  • CNI Version: v1.13.3
  • OS (e.g: cat /etc/os-release): EKS AMI v20230825
  • Kernel (e.g. uname -a): 5.10.186-179.751.amzn2

@sidewinder12s I see the problem here, and we really need better documentation around what WARM_IP_TARGET=0 or WARM_ENI_TARGET=0 mean.

Today, the VPC CNI only supports "static" allocation of IPs. At startup, we allocate the MINIMUM targets, then we continuously maintain the WARM thresholds as IPs are used for pods.

WARM_IP_TARGET=0 is "dynamic" IP allocation mode, which we do not fully support. Full support would mean that when the CNI requests an IP and none are available, IPAMD attaches a new ENI (if possible). Today, no new ENIs are attached, so when all IPs are in use, nothing new gets allocated. Customers use WARM_IP_TARGET=0 when they want to allocate a fixed number of MINIMUM IPs and then never allocate again.

For your use case, setting MINIMUM_IP_TARGET to a reasonable value and then WARM_IP_TARGET=1 makes the most sense. "Dynamic" IP allocation mode is something that has been requested a few times, but it is not currently on our roadmap. The goal for that support would be to never over-provision IPs, and the tradeoff is that scaling operations would take longer due to waiting for ENIs to be attached.

Thanks for confirming, that lines up with what appeared to be happening.

At least as far as your second comment about Dynamic IP Allocation, if you supported a fully dynamic allocation wouldn't the CNI first add additional IPs to the existing ENI until it was full or does it always have to allocate additional IPs to a new ENI? Just want to confirm that we're not exposing ourselves to ENI limits, we don't actually have that high pod density in our clusters, but we'd probably hit ENI limits if each additional IP add required a fresh ENI.

Thanks for confirming, that lines up with what appeared to be happening.

At least as far as your second comment about Dynamic IP Allocation, if you supported a fully dynamic allocation wouldn't the CNI first add additional IPs to the existing ENI until it was full or does it always have to allocate additional IPs to a new ENI? Just want to confirm that we're not exposing ourselves to ENI limits, we don't actually have that high pod density in our clusters, but we'd probably hit ENI limits if each additional IP add required a fresh ENI.

Yeah, we would allocate additional IPs to the existing ENI until it is full, then we would attach a new ENI and allocate IPs to it. We always want to pack as many pods to an ENI as possible.

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

Closing now that the README has been updated to cover this behavior: https://github.com/aws/amazon-vpc-cni-k8s?tab=readme-ov-file#minimum_ip_target-v160

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.