Multiple ENIs is confusing cloud-provider-aws controller

What happened:

I am working on deploying a Kubernetes cluster using cluster api and amazon-vpc-cni as the network manager of the cluster.

During my tests I observed a pretty strange behaviour of the cloud-provider-aws controller.

In fact, the Kubernetes node object internal IP changed from the private IP of the EC2 ENI (the one provisioned alongside the creation of instance) to the private IP of the ENI that was provisioned by the AWS VPC CNI controller. During my tests, I also saw that this behaviour was quite random.

It leads to a lot of problems such as kubectl not being able to send back results of commands such as kubectl logs or kubectl exec since kube-apiserver is forwarding those requests to the node hosting the pod using its internal IP fetched from the Node resource.

What I cannot explain though, is why this secondary private IP attached to the same instance is not answering properly those calls even though the firewall was allowing any kind of traffic from any source.

I've implemented a workaround to this issue by simply getting the primary IP of the node during runtime and passing the flag --node-ip to the kubelet before actually starting it.

To be sure that cloud-provider-aws don't override what I did, I've also set --allocate-node-cidrs=false flag.

What you expected to happen:

Once the Node object internal IP is set ; it should not be replaced by the one of the other ENI. Or using the other IP should not be a problem and then this issue is becoming a networking problem for the CNI team.

Anything else we need to know?:

Here's a screenshot that exposes the behaviour. The top pane shows it initially and the second pane the changed IPs after I've deployed the aws-vpc-cni + cloud-provider-aws controller.

Environment:

Kubernetes version (use kubectl version):

Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.0

Cloud provider or hardware configuration: v1.29.1
OS (e.g. from /etc/os-release):

NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Kernel (e.g. uname -a):

Linux ip-10-0-1-97 5.15.0-1056-aws #61~20.04.1-Ubuntu SMP Wed Mar 13 17:40:41 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Install tools: helm with values file

image:
    tag: v1.29.1

args:
  - --v=2
  - --allocate-node-cidrs=true
  - --cloud-provider=aws
  - --cluster-name="k993aws"
  - --cluster-cidr="10.0.0.0/16"
  - --configure-cloud-routes=false

/kind bug

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Once the Node object internal IP is set ; it should not be replaced by the one of the other ENI.

I don't think IP is replaced. kubectl shows just the one IP but the node object should have all the IP's as that is the default behavior and they should be ordered based on the interface number

cloud-provider-aws/pkg/providers/v1/aws.go

Lines 735 to 757 in cea2af6

    
           // extractIPv4NodeAddresses maps the instance information from EC2 to an array of NodeAddresses. 
        
           // This function will extract private and public IP addresses and their corresponding DNS names. 
        
           func extractIPv4NodeAddresses(instance *ec2.Instance) ([]v1.NodeAddress, error) { 
        
           	// Not clear if the order matters here, but we might as well indicate a sensible preference order 
        
           	if instance == nil { 
        
           		return nil, fmt.Errorf("nil instance passed to extractNodeAddresses") 
        
           	} 
        
           	addresses := []v1.NodeAddress{} 
        
           	// sort by device index so that the first address added to the addresses list is from the first (primary) device 
        
           	sort.Slice(instance.NetworkInterfaces, func(i, j int) bool { 
        
           		// These nil checks should cause interfaces with non-nil attachments to sort before those with nil attachments 
        
           		if instance.NetworkInterfaces[i].Attachment == nil { 
        
           			return false 
        
           		} 
        
           		if instance.NetworkInterfaces[j].Attachment == nil { 
        
           			return true 
        
           		} 
        
           		return aws.Int64Value(instance.NetworkInterfaces[i].Attachment.DeviceIndex) < aws.Int64Value(instance.NetworkInterfaces[j].Attachment.DeviceIndex) 
        
           	})

from cloud provider release 1.29.3. Can you upgrade and test?

Node controller will make sure that addresses of the instance is always same as node object addresses https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/cloud-provider/controllers/node/node_controller.go#L193-L197.

What I cannot explain though, is why this secondary private IP attached to the same instance is not answering properly those calls even though the firewall was allowing any kind of traffic from any source.

what is the error you are facing , did you check the apisever logs for the reason? It could be because of cert verification also.

Do you pass the --node-ip flag to kubelet?

what is the error you are facing , did you check the apisever logs for the reason? It could be because of cert verification also.

It's been quite some time, I have to dig back into it to get extra details. I had problem getting back results from commands like kubectl logs or kubectl exec being proxified by the api-server to the correct node's kubelet.

Do you pass the --node-ip flag to kubelet?

I had to do that as a workaround, yes. The IP I have set is the primary IP of initial network interface of the EC2 instance.

As soon as I have time, I'll try to get some more informations and put them here.

There's been some recent discussions about --node-ip and how the external CCM should handle it. At this point, passing --node-ip to kubelet is the right thing to do, for AWS at least. Here's how we do it for the AL2-based EKS AMI: https://github.com/awslabs/amazon-eks-ami/blob/e50acfb7e6be088dde823dc80b21c50651e71b01/templates/al2/runtime/bootstrap.sh#L490-L495

More: kubernetes/kubernetes#125337

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

	// extractIPv4NodeAddresses maps the instance information from EC2 to an array of NodeAddresses.
	// This function will extract private and public IP addresses and their corresponding DNS names.
	func extractIPv4NodeAddresses(instance *ec2.Instance) ([]v1.NodeAddress, error) {
	// Not clear if the order matters here, but we might as well indicate a sensible preference order

	if instance == nil {
	return nil, fmt.Errorf("nil instance passed to extractNodeAddresses")
	}

	addresses := []v1.NodeAddress{}

	// sort by device index so that the first address added to the addresses list is from the first (primary) device
	sort.Slice(instance.NetworkInterfaces, func(i, j int) bool {
	// These nil checks should cause interfaces with non-nil attachments to sort before those with nil attachments
	if instance.NetworkInterfaces[i].Attachment == nil {
	return false
	}
	if instance.NetworkInterfaces[j].Attachment == nil {
	return true
	}

	return aws.Int64Value(instance.NetworkInterfaces[i].Attachment.DeviceIndex) < aws.Int64Value(instance.NetworkInterfaces[j].Attachment.DeviceIndex)
	})