aws/amazon-vpc-cni-k8s

Ability to set pod MTU separate from ENI MTU (or eth0)

archoversight opened this issue ยท 16 comments

What would you like to be added:

I'd like to have the ability to set the MTU for my pod's virtual interfaces to a lower MTU than the MTU for my eth0. I am running an IPv6 only EKS cluster, and attempting to deploy Cillium on top with Wireguard encryption so that pod to pod traffic is transparently encrypted.

Unfortunately the overhead from the Wireguard tunnel and lack of path MTU means that currently traffic is silently dropped if it tries to send packets of MTU 9001 when the Wireguard tunnel is set to 8921 MTU.

I tried setting AWS_VPC_ENI_MTU to 8000 as an example, but it seems to also change the MTU when starting for eth0 which is not what I want.

Why is this needed:

When chaining CNI's that add transparent encryption or encapsulation on IPv6 hosts where path MTU does not function, it would be nice to have an escape hatch.

References:

cilium/cilium#28413 (comment)
cilium/cilium#28387
https://aws.amazon.com/blogs/containers/transparent-encryption-of-node-to-node-traffic-on-amazon-eks-using-wireguard-and-cilium/

@archoversight setting AWS_VPC_ENI_MTU should set the MTU for the pod's virtual interfaces. eth0 is the name of the pod veth endpoint in the pod networking namespace. Are you not seeing the MTU get set on that interface?

@archoversight setting AWS_VPC_ENI_MTU should set the MTU for the pod's virtual interfaces. eth0 is the name of the pod veth endpoint in the pod networking namespace. Are you not seeing the MTU get set on that interface?

I am using EKS in IPv6 mode, eth0 (host) has a prefix delegated to it. I am looking at the host (pod launched with kubectl debug) and it is showing that eth0's MTU is changing, not just the veths that are created.

I am seeing the MTU set on the veth interfaces correctly, but I also see the MTU change on eth0.

@jdn5126 we're referring to the host eth0, not in a pod. It's true that in the pods the primary interface shows up as eth0.

The host eth0 is the primary ENI, which the VPC CNI does not manage. If you want to change the MTU on the primary ENI, you would need to do so in the AMI, or in the node group template

The host eth0 is the primary ENI, which the VPC CNI does not manage. If you want to change the MTU on the primary ENI, you would need to do so in the AMI, or in the node group template

Have you tried this? This issue is saying the opposite and that it is controlling the primary ENI - I've observed it myself separately from @archoversight on my own cluster. Modify the config setting in the addon, then start up new nodes, and the primary eth0 eni on the host now has the new MTU value along with the pods that start on that node.

@mmerickel I am under the impression that the MTU for the primary ENI should not change. I have not tried this recently, so I will try to test this out next week

Sorry for the delay, I got caught up with other issues. Planning to work on this tomorrow

@mmerickel I also see the behavior that you described, and I see the MTU for the primary ENI being set here in the code: https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/networkutils/network.go#L300. This logic is called during aws-node pod initialization.

I am not sure why this is set for the primary ENI, though I am wondering if it was done to try to prevent IP fragmentation. If the MTU value is smaller on the pod veth than the primary ENI, that should not be a problem, though. The original PR has little documentation: #676.

Perhaps it was done for consistency? Adding @jayanthvn in case he has any opinion here

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days

/not-stale

We discussed internally, and to support this enhancement, we would need a new environment variable. Currently, AWS_VPC_ENI_MTU is set on all ENIs and pod virtual interfaces. We cannot break existing behavior, so to support the pod virtual interfaces having a lower MTU than the ENIs, we would need a new environment variable like POD_MTU, which overrides AWS_VPC_ENI_MTU for pods only if set.

This makes sense to me.

Closing as #2791 has merged. This PR will ship in VPC CNI v1.16.4, which is targeting late Feb/early March

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.

Closing as #2791 has merged. This PR will ship in VPC CNI v1.16.4, which is targeting late Feb/early March

Awesome, thank you!