[EKS]: Next Generation AWS VPC CNI Plugin

Question

[EKS]: Next Generation AWS VPC CNI Plugin

Closed this issue 3 years ago · 59 comments

Edit 8/3/2020, see below comment for update on the status of this feature. There will not be a single new plugin release, but rather a series of new features on the existing plugin.

We are working on the next version of the Kubernetes networking plugin for AWS. We've gotten a lot of feedback around the need for adding Kubenet and support for other CNI plugins in EKS. This update to the VPC CNI plugin is specifically built to solve the common limitations customers experience today with the AWS VPC CNI version 1 and other Kubernetes CNI plugins.

Notably:

Limited pod density per worker node
Need to re-deploy nodes to update CNI and route configurations
Static allocation of CIDR blocks for pods

Architecturally, the next generation VPC CNI plugin differs from the existing CNI plugin. The new plugin cleanly separates functionality that was tightly coupled in the existing CNI:

Wiring up the network for pods that runs on the Kubernetes worker nodes (data plane)
Management of underlying EC2 networking infrastructure (control plane)

Pod networking (data plane) will continue to be part of the worker nodes, but the management of the networking infrastructure will be decoupled into a separate entity that will most likely run on the Kubernetes control plane. This will allow dynamic configuration of the CNI across a cluster, making it easy to support advanced networking configurations and change the networking configuration of the cluster on a per-service basis without restarting nodes.

These new functional behaviors are all supported while maintaining conformance to the standard Kubernetes network modelrequirements.

We think this CNI design will give customers the power and flexibility to run any workload size or density on a Kubernetes cluster using a single CNI plugin. We plan to implement this as the standard CNI for Amazon EKS, and release it as an open source project so that anyone running Kubernetes on AWS can utilize it.

The advantage of this approach is that it supports multiple networking modes and allows you to use them on the same cluster at the same time. We think these will be:

Assign VPC secondary IP addresses to pods like the VPC CNI plugin does today.
Allow pods to use IPs defined by a CIDR block assigned directly to a node. This is a separate CIDR range distinct from the node's network. This will give you the ability to have very large pod per node density – e.g. 256 or more pods on a [x].large EC2 instance without consuming your VPC IP space.
Assign ENIs directly to pods. This mode takes advantage of EC2 ENI trunking and enables you to use all ENI features within the pod, such as assigning a security group directly to a pod.

You will be able to change which networking mode is used for pods on any given node and adjust the CIDR blocks used to assign IPs at any time. Additionally, the same VPC CNI plugin will work on both Linux and Windows nodes.

We're currently in the design and development stage of this plugin. We plan to release a beta of the CNI in the coming months. After this new CNI is generally available, we'll make it available in EKS. We do not plan to deprecate the current CNI plugin within EKS until we achieve parity between both generations of CNI plugins.

Let us know what you think below. We'll update this issue as we progress.

Answer 1 · 2019-07-04T09:33:04.000Z

Does IPv6 support feature in the new design? I'd like to be able to run a dual stack network, assigning both an IPv4 and an IPv6 address to each pod. In this configuration, the behaviour of Kubernetes is that it will use the IPv4 address for things like service endpoints but it would allow pods to connect to external IPv6 sites.
I did try patching the existing aws-vpc-cni to support this and identified a number of issues such as:

IPv6 was disabled in the eks ami
dhcpv6 would attempt to assign all the ipv6 addresses allocated to the ec2 instance to the primary network instance (meaning they could not be moved to the pods when dhcpv6 was running).
The aws vpc cni could only assign one ip address to the pod
The aws vpc cni could not request ipv6 addresses from the aws api.
I have managed to get it working with some manual poking.

Answer 2 · 2019-07-09T09:20:37.000Z

assigning a security group directly to a pod

Definitely looking forward to this feature; there's plenty of uses for it.

Answer 3 · 2019-07-09T16:54:01.000Z

@gregoryfranklin yes. While we are not currently planning to support IPv6 in the initial release, we believe this design is extensible and will allow us to support IPv6 in the future. Interested in learning more about the need for dual stack, I think this is a separate networking mode that we will need to consider.

Answer 4 · 2019-07-10T11:11:12.000Z

Interested in learning more about the need for dual stack

Dual stack is a migration path to IPv6-only.

We have several EKS clusters connected to a larger internal network via direct connects (hybrid cloud). IP address space is something we are having to start thinking about. Its not an immediate problem, but will be in the next few years, which means we are having to think about migration paths now.

For ingress, traffic comes through an ELB which can take inbound IPv6 traffic and connect to an IPv4 backend pod. However, for egress the pods need to have an IPv6 address to connect to IPv6 services (in addition to an IPv4 address to connect to IPv4 services).

Dual stack pods would allow us to run parts of the internal network IPv6-only. For example, a webapp running in EKS could use an IPv6 database in our own datacentres.

Being able to expose our apps to IPv6 traffic is an important step in identifying and fixing IPv6 bugs in our code and infrastructure (of which we have many). Also it stops developers from introducing new IPv6 bugs.
Full IPv6 support is expected to take several years after enabling support at a network level. Its therefore important to us that we have IPv6 support at the network level so that we can work on the layers above.

Answer 5 · 2019-07-11T15:00:14.000Z

+1 for IPv6 support due to IPv4 exhaustion. Especially when scaling EKS to higher number of nodes additional CIDRs have to be added to the VPC. IPv6 would be a perfect fix for this and would enable easier higher density EKS-clusters.

Answer 6 · 2019-07-11T15:21:23.000Z

Main reason I would want IPv6 is to run a cluster that is IPv6 only. Right now that's not something that Kubernetes itself supports very well; however, things seem to be catching up fast.

To handle connections from the public dual-stack internet, you could use Ingress, Proxy Protocol, etc (similar to how a typical cluster today maps from public IPv4 to private IPv4).

Possibly a SOCKS or HTTP proxy for outbound traffic too, which would allow access to IPv4-only APIs.

Answer 7 · 2019-07-15T09:37:38.000Z

We are very much enthusiastic about this next-gen plugin that would benefit us greatly:

the 'higher pod density for small instances' part, as we are running nodejs microservices where a single smaller instance (eg. t2.medium) is perfectly fine running 30-50 pods resource-wise, but the current CNI plugin imposes a pod limit that results in highly under-utilized nodes. That makes it hard to justify EKS compared to alternatives. We'd prefer a managed control plane on AWS though.
the native 'security group per pod' part, as it would (hopefully) reduce user-facing complexity compared to kube2iam

So to summarize, this proposal is something we are greatly anticipating, and IMHO this sounds much more like a production-ready 1.0 CNI plugin from AWS compared to the previous one (that sadly doesn't really work for us microservice guys).

Keep up the good work!

Answer 8 · 2019-07-19T13:15:43.000Z

@tabern Will the per-node pod CIDRs be implemented using kubenet or more like a full-blown overlay network?
Will this impose any limitations on the CNI used for NetworkPolicy, e.g. Calico or Cilium v1.6?

Answer 9 · 2019-08-05T07:36:12.000Z

security goal that'd be useful:

no matter the mode (i.e. ENI trunking or secondary IP approach) or user configuration (e.g. lack of Network Policies through, let's say, Calico), the CNI should prevent Pods from accessing the host's metadata endpoint. this is a common issue seen in practice, which results in unintended credential exposure.

seems straightforward to solve with an iptables rule at the node when setting up a container's veth pair in https://github.com/aws/amazon-vpc-cni-k8s/blob/6886c6b362e89f17b0ce100f51adec4d05cdcd30/plugins/routed-eni/driver/driver.go (i.e. block traffic to 169.254.169.254 from that veth interface), for the general case. I am not familiar with ECS trunking, so I cannot suggest an approach there.

Answer 10 · 2019-08-05T07:51:14.000Z

note that this rule construction is kube2iam's general approach https://github.com/jtblin/kube2iam#iptables, though it doesn't drop the traffic from the Pod outright, due to its feature set. they use a neat 'glob' I wasn't aware of, so you wouldn't need to create a rule per-veth at creation time (i.e. eni+ to match all after that prefix).

Answer 11 · 2019-09-03T15:10:14.000Z

any update on this @tabern ?

Answer 12 · 2019-09-19T02:40:38.000Z

Is there a repository where the progress of this next-gen cni code can be viewed and tracked?

Answer 13 · 2019-09-19T17:05:59.000Z

Any thoughts around adding support for enforcing network policies to the cni plugin? It would be great if security groups could be used in the ingress/egress rules for network policies.

Answer 14 · 2019-10-01T14:26:19.000Z

We have certain use-cases where we need to expose the pods directly to the public internet, so they need a public IP (WebRTC, STUN/TURN). It would be an awesome feature if the new CNI plugin would be able to assign a public IP or EIP to pods (e.g. when a certain annotation on the pod is given) and also put the assigned IP into some status field or annotation of the pod.

Currently we are working around this by using autoscaling groups with node taints (dedicated=xyz:NoSchedule) and which assign public IPs to the instances. The instances in these autoscaling groups don't have the CNI plugin enabled and are only used by pods with host network enabled.

Answer 15 · 2019-10-14T14:16:01.000Z

@tabern any timeline on this feature?

Answer 16 · 2019-10-24T21:42:43.000Z

@eightnoteight we're continuing to work on this project. Nothing else to share right now per the rules of this roadmap. :)

Answer 17 · 2019-11-08T13:52:17.000Z

I am here just to say that the limited pods per node density is going to become a significant problem for us in the next 3 months because we are going to roll-out many small services on our t3.large nodes.

Answer 18 · 2019-11-08T18:02:10.000Z

I am sure that everyone appreciates the effort that is almost certainly going into this feature.Thank you for all your hard work!

Just wanted to mention that, in addition to the features outlined in the description above, support for NetworkPolicy resources would be amazing as native support for them is dearly missed right now.

Answer 19 · 2019-12-03T22:04:19.000Z

We're using <5% of our cluster CPU capacity at this point but have to keep scaling out nodes just to ensure available IP's. I'd love to see this limitation resolved.

Answer 20 · 2019-12-07T00:53:21.000Z

Since the progress on this has been kind of 'hush hush', I thought this was likely going to be announced at reInvent. Now that reInvent has passed, can you provide us any additional information on this progress of this plugin @tabern ?

Answer 21 · 2019-12-12T12:56:24.000Z

This is a major point of pain for us too - we replicate a complex third party software stack that runs as a cluster of containers - each needing its own ip address. Our Ec2s are at fractional CPU usage and high IP address usage, and it's painful to continue having to scale out the cluster due to the artificial limitations on ip addresses imposed by the ENIs.

This feature cannot come soon enough for us.

Answer 22 · 2019-12-19T08:32:26.000Z

I found roadmap CNI design to put Pod IP and other information into pod annotation directly, cni plugin can get information from Pod annotation in kube-apiserver directly.

so I'm confused about configure file /etc/cni/net.d/eni.conf. We can put more information to pod annotation to make pod network configuration more flexible. and if that, does will more feature in eni.conf be moved to Pod annotation or crd in the future?

Answer 23 · 2019-12-25T09:45:35.000Z

can you provide us any additional information on this progress of this plugin? We all need it really bad, it is stopping us from migrating our load into EKS on production.

Answer 24 · 2020-01-13T17:01:14.000Z

We need more information about the progress. It's a big problem the cantity of IPs that consume the pods in my private subnets and the efficiency in resource consumption of nodes.

Answer 25 · 2020-01-14T11:14:08.000Z

Note that it's only the AWS VPC CNI that has this pod density per worker node issue. All other CNIs allow more much generous amount of pod IPs per node.
Most other CNIs use a private overlay network which is not accessible from outside the Kubernetes cluster. Normally you do not connect directly to Kubernetes pods and all traffic enters the cluster via Service with type=LoadBalancer or NodePort.

Answer 26 · 2020-01-14T15:19:53.000Z

@morganchristiansson my biggest issue I've run into with other CNI-s is that in-cluster admission controllers won't work anymore. Also kubectl proxy will cease to work.

The root cause of both of those issues is that the master nodes won't know how to route to the overlay network.

Answer 27 · 2020-02-18T09:07:33.000Z

Any updates? Given the magnitude of this, I'm sure a lot of people were hoping for more a pro-active communication approach

Answer 28 · 2020-03-06T19:58:39.000Z

This is really frustrating. I think we can all understand things being delayed, but not getting any updates for months makes it feel like AWS does not take EKS seriously (which is also shown by the same lack of updates on other important issues in this project)

Answer 29 · 2020-03-20T08:42:56.000Z

@tabern Does this scope in any way include some option for overlay/NAT'd networking, or an option to easily use some of the standard CNIs ? In our hybrid setup for example, we'd prefer if we only had to tunnel the (much smaller) range of node IPs to our datacenters, instead of the large pod range.

Answer 30 · 2020-03-30T17:18:23.000Z

I recently evaluated Managed Node Groups with the mistaken assumption that deleting the aws-node daemonset and installing a CNI with alternate IPAM like Calico would remove the max pod limit. I didn't realize that a bootstrap argument was required, and that Managed Node Groups do not support bootstrap arguments. So, this essentially means that anyone who requires more than the max pod limit per node is effectively limited to only unmanaged worker nodes at this point.

I thought I would post here for the benefit of anyone else considering Managed Node Groups, since many people will ultimately find the usefulness limited until this next-gen CNI is available.

There are indeed hacks like creating a daemonset that runs a script to update the node configuration (with a chroot to the host), or manually SSHing into the nodes. I created a support ticket for official advice about how to work around this issue and was informed that any such modifications to update the max pod limit are considered out of band, may introduce inconsistencies, and are not recommended or supported.

EDIT: Updated for clarity

Answer 31 · 2020-04-01T04:21:33.000Z

@bencompton that's a great input thanks you! do you mind sharing the link for the ticket you have created?

Answer 32 · 2020-04-08T09:00:18.000Z

Nearing the 6 month mark on the last update from the AWS team

Answer 33 · 2020-04-20T20:02:20.000Z

@tabern Checking to see how close is EKS team in delivering this much needed feature. IP exhaustion is big problem with our organization and EKS Custom CNI networking using AWS VPC CNI pluggin do not scale well for us.

Answer 34 · 2020-05-08T17:29:09.000Z

Hi AWS Team, do you guys have any update on this ?

Answer 35 · 2020-05-14T04:47:01.000Z

Will the next gen CNI implement Kubernetes Network Policies? E.g. by configuring VPC Security Groups an assigning them to Pod ENIs, or to Pod IP/CIDR, or another approach?

Right now we have to rely on the third party Calico option, which is an instance/kernel based option and can't be used with EKS Fargate. If Kubernetes Network Policy support is out-of-scope for the next-gen CNI, it would be great to get better support for Calico from AWS. Right now there are installation instructions, but when I asked support about an documentation or procedures for Calico upgrades on EKS, they were helpful but pointed out Calico is not supported by AWS.

Answer 36 · 2020-05-20T14:30:19.000Z

Assuming the next gen CNI plugin will support network policies, are there any plans to support useful extensions like FQDN filtering and DNS based network policies?

For example, something similar to Cilium:

 egress:
  - toFQDNs:
    - matchPattern: "*.twitter.com"

Answer 37 · 2020-07-23T17:00:31.000Z

Hey everyone,

We appreciate all the feedback and we’ve been listening. We are working on a number of CNI improvements, and want to share more details on the upcoming roadmap. We plan to release features in a staged rollout, rather than having a single VPC CNI release that will include all functionality as originally listed in this issue. Below is a list of the improvements we are working on, in approximate order of planned release.

Security groups per pod
Coming soon, you will able to assign security groups directly to pods. We will release a new controller (the VPC resource controller) running on the Kubernetes control plane that integrates with EC2 ENI trunking. At launch, you’ll be able to assign security groups to pods on EC2 worker nodes, and we will eventually add support for Fargate. With this feature, you’ll implement network security rules outside of the cluster in EC2 security groups, and then be able to assign those security group IDs to pods using a new custom resource definition. Note that this is different from an implementation of Kubernetes network policies. Separately, we are also in the early stages of exploring building our own network policy enforcement controller.

Simplifying CNI custom networking
CNI custom networking allows pods to run on a separate subnet from nodes, including subnets from secondary CIDR blocks, enabling you to create an environment where pods no longer consume any IPv4 addresses from your primary VPC CIDR block. This feature helps solve IPv4 exhaustion challenges, but it suffers from usability issues listed below:

Setting up secondary VPC CIDR blocks can be time consuming, and requires a string of EC2 API calls.
- We’ll address this by adding secondary CIDR support to eksctl, as well as other automation features such as vending CloudFormation templates to setup a VPC with a secondary CIDR range for pods. We’ll continue to add documentation and blog content like this, to help you configure the VPC CNI in IPv4 constrained environments.
Max pods must be manually calculated and passed to kubelet of worker nodes.
- We’ll automate this process so users don’t need manually calculate a value that is dependent on networking mode. This will unlock CNI custom networking with Managed Node Groups.
ENIConfigs must be created for each availability zone.
- We’ll address this with a tagging based option that will automatically discover secondary CIDR subnets that should be used by pods.
Enabling custom networking requires updating aws-node daemonset environment variables, which can be overwritten when following the default VPC CNI upgrade instructions.
- We’ll address this by moving VPC CNI configuration to a ConfigMap.

Increased pod density
We will integrate the VPC CNI plugin with an upcoming VPC feature that allows for additional blocks of secondary IPv4 addresses to be added to Elastic Network Interfaces. This will allow for all worker nodes to support at least the Kubernetes recommended pods per node thresholds (min(110, 10*#cores)). This includes worker nodes using CNI custom networking, where the primary ENI is not used for pods.

Security Improvements
Today, the VPC CNI plugin includes a daemon that runs on every worker node, which makes EC2 API calls to configure networking reachability for pods. As mentioned above, the security groups for pods release will include a controller running on the Kubernetes control plane for ENI trunking/branching, and we plan to eventually migrate all ipamd functionality out of the CNI plugin into this separate controller, removing the need for each node in your cluster to have broad EC2 API permissions.

IPv6
We feel the best solution for IP exhaustion is IPv6, and we plan to invest heavily in this area over the rest of this year and into 2021, with the end goal of having the VPC CNI plugin support IPv6 only pods. There will be multiple milestones along the way, which you can learn more about and leave feedback on in this GitHub issue.

We believe that this feature roadmap will address the majority of networking challenges present today, however, we also realize that a single CNI plugin is unlikely to meet every possible use case, and to that end we have been working closely with our partners that maintain alternate compatible CNI plugins. These partners have developed EKS specific landing pages along with details on how to obtain commercial support, which we have highlighted in our documentation.

To help us better track interest in particular features, we have created and linked each separate roadmap item as an issue below.

Security groups per pod #177
Simplified CNI custom networking #867
Increased Pod Density #138
VPC CNI configuration settings from ConfigMap #865
Remove requirement for EC2 permissions on aws-node VPC CNI daemon #866
Support for IPv6 #835

We’ll leave this issue open for general feedback on the CNI roadmap, but please add +1s to the specific GitHub feature requests that matter most to you.

Answer 38 · 2020-07-24T05:28:47.000Z

We believe that this feature roadmap will address the majority of networking challenges present today, however, we also realize that a single CNI plugin is unlikely to meet every possible use case, and to that end we have been working closely with our partners that maintain alternate compatible CNI plugins. These partners have developed EKS specific landing pages along with details on how to obtain commercial support, which we have highlighted in our documentation.

Is it ever going to be possible to use one of these partner CNIs with AdmissionWebhooks? E.g. routable from the API server to the overlay network?

Answer 39 · 2020-07-29T22:03:33.000Z

I appreciate the recent update from @mikestef9, but I still have no sense of what this means in terms of timing. Our org has desperately wanted to switch to EKS for various reasons, but node density and CNI custom networking improvements are must-haves for us. I'm not expecting exact dates, but it feels like these improvements have been in the "coming months" stage for over a year. If these improvements aren't rolled out by say, EOY - it's quite probable we just have to skip our EKS plans altogether.

Answer 40 · 2020-08-04T15:59:43.000Z

We believe that this feature roadmap will address the majority of networking challenges present today, however, we also realize that a single CNI plugin is unlikely to meet every possible use case, and to that end we have been working closely with our partners that maintain alternate compatible CNI plugins. These partners have developed EKS specific landing pages along with details on how to obtain commercial support, which we have highlighted in our documentation.

Is it ever going to be possible to use one of these partner CNIs with AdmissionWebhooks? E.g. routable from the API server to the overlay network?

@MarcusNoble
I think one workaround you can do is setup a managed ingress like aws alb and make the admission webhooks come via the aws alb. we are using this method even for the aws cni to get more visibility around the callbacks

Answer 41 · 2020-08-10T09:11:07.000Z

@eightnoteight That only really helps where you're managing the webhooks yourself. We've got a few third-party applications used in our clusters that set up webhooks for themself so we'd need to manually modify the manifests of those applications or in the case where they're created in code at runtime we'd need to fork and update the application. 😞

Answer 42 · 2020-08-10T14:42:20.000Z

@mikestef9 Will overlay network be option for pod networking or it will as be eni based?

Answer 43 · 2020-08-10T15:16:54.000Z

No, we will not be building any overlay option into the VPC CNI plugin, as that strays quite a bit from the original design goals of the plugin and will add too much complexity. Custom networking is our "overlay-like" option in the VPC CNI, but as I mentioned above, "we also realize that a single CNI plugin is unlikely to meet every possible use case", and added links to our docs that do list alternate CNI plugins with overlay options.

We feel the best solution to IPv4 exhaustion is IPv6, and that's where we are investing with the VPC CNI plugin.

Answer 44 · 2020-08-20T13:56:27.000Z

How does Increased Pod Density and Security Groups Per pod interoperate? Will the be compatible with each other? I saw a comment about a limit of 50 enis per node mentioned when it comes to vlan tagging

Answer 45 · 2020-08-28T15:32:35.000Z

@mikestef9 I'm glad you are acknowledging that the VPC CNI cannot meet every possible use case, and I'm grateful for the documentation that has been added on how to install alternate CNIs on EKS. However, all of these alternate CNIs have the limitation that they cannot be installed on to the control plane master nodes, which I am sure you are aware of. This means that things like admission controller webhooks will fail, as well as other things that require a control plane node to communicate with a pod on a worker node. Are there any plans in place to fix this problem to allow 3rd party CNIs to be fully functional?

Answer 46 · 2020-09-05T21:20:53.000Z

Hi @mikestef9 , is there any documentation available to configure POD's using Security Group related Custom resource definition?
Since this feature is available in latest VPC CNI 1.71, I would like to understand, more on the configuring SG per pod in EKS.
I want to try this feature of the VPC CNI.
Please suggest if any documentation available for this.

Answer 47 · 2020-09-09T17:35:42.000Z

Documentation is published

https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html

Stay tuned for further updates on #177

Answer 48 · 2020-09-10T07:34:48.000Z

Thanks, @mikestef9 for sharing

Answer 49 · 2020-09-10T18:20:22.000Z

Documentation is published

https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html

Stay tuned for further updates on #177

Nice!

Does @mikestef9 have any timeline for security-groups-for-pods on Fargate? It'll be useful to do the migration plan

Answer 50 · 2020-09-10T19:35:09.000Z

You can follow this issue #625 for updates on that feature request. No timeline to share right now. Note that the UX we have in mind there will be the same as the SecurityGroupPolicy CRD for worker nodes, and not something that is added to the Fargate Profile

Answer 51 · 2021-02-09T18:49:35.000Z

I recently evaluated Managed Node Groups with the mistaken assumption that deleting the aws-node daemonset and installing a CNI with alternate IPAM like Calico would remove the max pod limit. I didn't realize that a bootstrap argument was required, and that Managed Node Groups do not support bootstrap arguments. So, this essentially means that anyone who requires more than the max pod limit per node is effectively limited to only unmanaged worker nodes at this point.

I thought I would post here for the benefit of anyone else considering Managed Node Groups, since many people will ultimately find the usefulness limited until this next-gen CNI is available.

There are indeed hacks like creating a daemonset that runs a script to update the node configuration (with a chroot to the host), or manually SSHing into the nodes. I created a support ticket for official advice about how to work around this issue and was informed that any such modifications to update the max pod limit are considered out of band, may introduce inconsistencies, and are not recommended or supported.

EDIT: Updated for clarity

@bencompton Sorry for resurrecting an old comment, but I wanted to post a solution to this problem here since no one else has yet.

Setting the max number of pods per node is a native kubelet functionality, see --max-pods. The AWS documentation suggests setting this value by passing something like --use-max-pods false --kubelet-extra-args '--max-pods=20' to the bootstrap.sh script. The bootstrap.sh script takes the received value and sets it into the kubelet config file using jq here.

It is not possible to pass the documentation suggested arguments to the bootstrap.sh script with managed worker nodes, however, it is possible to add custom userdata to a launch template that is utilized by your managed worker nodes. There are some requirements for the formatting of the userdata that are not typical, so make sure to familiarize yourself with the specifics here.

So to set a custom maxpods value you need to do 2 things:

set USE_MAX_PODS to false when bootstrap.sh executes to prevent a maxPods value from being set in the kubelet config file
set a custom maxPods value into the kubelet config file as done here

Here is my userdata which implements these 2 tasks:

#!/bin/bash
set -ex

BOOTSTRAP_SH=/etc/eks/bootstrap.sh
BOOTSTRAP_USE_MAX_PODS_SEARCH="USE_MAX_PODS:-true"
KUBELET_CONFIG=/etc/kubernetes/kubelet/kubelet-config.json
MAX_PODS=20 # put whatever quantity you want here

# set a maxPods value in the KUBELET_CONFIG file
echo "$(jq ".maxPods=$MAX_PODS" $KUBELET_CONFIG)" > $KUBELET_CONFIG

# search for the string to be replaced by sed and return a non-zero exit code if not found. This is used for safety in case the bootstrap.sh
# script gets changed in a way that is no longer compatible with our USE_MAX_PODS replacement command.
grep -q $BOOTSTRAP_USE_MAX_PODS_SEARCH $BOOTSTRAP_SH

# set the default for USE_MAX_PODS to false so that the maxPods value set in KUBELET_CONFIG will be honored
sed -i"" "s/$BOOTSTRAP_USE_MAX_PODS_SEARCH/USE_MAX_PODS:-false/" $BOOTSTRAP_SH

This is a workaround that works for now. This is certainly not recommended by AWS, and could break at some point in time depending on updates made to the bootstrap.sh script. So use this method with caution. Eventually this should no longer be needed based upon this comment from @mikestef9 above and #867:

Max pods must be manually calculated and passed to kubelet of worker nodes.

We’ll automate this process so users don’t need manually calculate a value that is dependent on networking mode. This will unlock CNI custom networking with Managed Node Groups.

Answer 52 · 2021-04-28T11:26:15.000Z

This is my current workaround for CNI custom networking with MNG (managed node group) which is dynamic but requires access to IMDS,
EC2 API and internet for installing bc for calculation (could be done with built-in Python as well for sure ;-) ):

Custom launch template user data

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==

Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash

#Amazon Linux 2 based script

#to determine instance type from instance metadata and calculate max pods for CNI custom networking

#and set this inside EKS bootstrap script
#install bc, requires internet access

yum -y install bc
#gather instance type from metadata

INST_TYPE=$(curl -s http://169.254.169.254/latest/meta-data/instance-type)
#gather region from metadata, jq is pre-installed

export AWS_DEFAULT_REGION=$(curl -s http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region)
#gather ENI info, aws CLI is pre-installed, requires internet access

ENI_INFO=$(aws ec2 describe-instance-types --filters Name=instance-type,Values=$INST_TYPE --query "InstanceTypes[].[InstanceType, NetworkInfo.MaximumNetworkInterfaces, NetworkInfo.Ipv4AddressesPerInterface]" --output text)
#calculate max-pods

MAX_ENI=$(echo $ENI_INFO | awk '{print $2}')

MAX_IP=$(echo $ENI_INFO | awk '{print $3}')

MAX_PODS=$(echo "($MAX_ENI-1)*($MAX_IP-1)+2" | bc)
sed -i 's/^USE_MAX_PODS=.*/USE_MAX_PODS="false"/' /etc/eks/bootstrap.sh

sed -i '/^KUBELET_EXTRA_ARGS=/a KUBELET_EXTRA_ARGS+=" --max-pods='$MAX_PODS'"' /etc/eks/bootstrap.sh
--==MYBOUNDARY==--

Answer 53 · 2021-05-19T10:18:48.000Z

Hi,

Any update when can we have IPv6 support for EKS. Also is there any workaround to have a dual stack support for pods in EKS

Regards,
Gaurav

Answer 54 · 2021-07-16T09:48:13.000Z

@mikestef9 Unfortunately this is no longer on the roadmap?

Answer 55 · 2021-07-16T12:33:48.000Z

@davidroth I think it was kinda replaced/broken up into more smaller features - like IPv6 support, higher IP density for pods on nodes, etc etc. So there's no longer going to be an explicit switch to a brand new plugin, more continuous improvements to the existing one :)

Edit: it was touched upon here: #398 (comment)

Answer 56 · 2021-09-06T16:52:04.000Z

As the feature is now GA -- see https://aws.amazon.com/jp/blogs/containers/amazon-vpc-cni-increases-pods-per-node-limits/ for details -- suggest closing this issue.

Answer 57 · 2021-09-07T01:12:35.000Z

@FlorianOtel the last outstanding pain point discussed originally in this issue IPv4 exhaustion. I plan on closing once we launch IPv6 support #835

Answer 58 · 2021-12-08T11:57:59.000Z

It would be greatly helpful if with regards to the VPC CNI plugin, and especially around windows support, if the documentation and troubleshooting would be updated to cover how you're supposed to debug the new wiring rather than as present covering how to debug the older webhooks/controller version. (if theres a separate repo for documentation, please let me know)

Part of this would appear to be working on and completing several aged prs in the CNI repo which help to address the way the CNI setup fails silently / without feedback.

Answer 59 · 2022-01-07T03:59:18.000Z

Closing as we have now released native VPC CNI features to address all of the initial pain points discussed in this issue.

pod density #138
pod level security groups #177
IPv4 exhaustion #835