aws/amazon-vpc-cni-k8s

CNI not removing network built on a node after IP is lost externally and IPAMD reconciles this state

AbeOwlu opened this issue · 5 comments

IPAM reconciliation:
Scenario;

  • Pod is created and assigned an IP, 10.0.2.99
  • the IP after complete sandbox initialization is reclaimed by an automation in the network external to the cluster
  • the IPAMD logs show an IP pool reconcile that catches this lost IP and reconciles its cache calling EC2 endpoint
  • the network route for this pod with IP 10.0.2.99 remains unchanged on the local node however, other node peers are no longer able to reach this pod on 10.0.2.99 of its host nodes, it is reachable from this local host and kubernetes liveness probes are succeeding - keeping an unhealthy pod in the cluster

{"level":"debug","ts":"2024-03-08T18:10:50.378Z","caller":"rpc/rpc.pb.go:713","msg":"AddNetworkRequest: K8S_POD_NAME:\"liveness-http\" K8S_POD_NAMESPACE:\"gateway-ns\" K8S_POD_INFRA_CONTAINER_ID:\"7f92409d45a01365839f5db2b7c30c35626c1de02779233046bf5c1bd2c59380\" ContainerID:\"7f92409d45a01365839f5db2b7c30c35626c1de02779233046bf5c1bd2c59380\" IfName:\"eth0\" NetworkName:\"aws-cni\" Netns:\"/var/run/netns/cni-d4e752dc-bdf7-f594-2a1a-38dfa2445dfb\""}

{"level":"info","ts":"2024-03-08T18:10:50.378Z","caller":"datastore/data_store.go:750","msg":"AssignPodIPv4Address: Assign IP 10.0.2.99 to sandbox aws-cni/7f92409d45a01365839f5db2b7c30c35626c1de02779233046bf5c1bd2c59380/eth0"}

Externl automation event Event time
March 08, 2024, 18:11:25 (UTC+00:00) UnassignPrivateIpAddresses  "privateIpAddress": "10.0.2.99"

{"level":"warn","ts":"2024-03-08T18:12:00.256Z","caller":"ipamd/ipamd.go:1404","msg":"Instance metadata does not match data store! ipPool: [10.0.2.99 10.0.2.27 10.0.2.158], metadata: [{\n  Primary: true,\n  PrivateIpAddress: \"10.0.2.149\"\n} {\n  Primary: false,\n  PrivateIpAddress: \"10.0.2.27\"\n} {\n  Primary: false,\n  PrivateIpAddress: \"10.0.2.158\"\n}]"}

{"level":"info","ts":"2024-03-08T18:12:00.334Z","caller":"datastore/data_store.go:578","msg":"UnAssignPodIPAddress: Unassign IP 10.0.2.99 from sandbox aws-cni/7f92409d45a01365839f5db2b7c30c35626c1de02779233046bf5c1bd2c59380/eth0"}

What you expected to happen:

  • After event "UnAssignPodIPAddress: Unassign IP 10.0.2.99 from sandbox aws-cni/7f9240... the CNI is triggered to tear down the network route with this IP, and liveness probe may eventually fail and attempt to heal this pod.

How to reproduce it (as minimally and precisely as possible):

  • create pod with liveness and readiness probe, like;
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness3
  name: liveness-http3
spec:
  containers:
  - name: ngo-proxy
    image: gcr.io/google_containers/echoserver:1.4
    # args:
    # - /server
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /health
        port: 8080
        # httpHeaders:
        # - name: Custom-Header
        #   value: Awesome
      initialDelaySeconds: 60
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 5
    readinessProbe:
      failureThreshold: 3
      httpGet:
        path: /health
        port: 8080
      # initialDelaySeconds: 50
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 2
  restartPolicy: Always
  • remove the IP from the node this pod is scheduled at any time

Anything else we need to know?:

  • during the sweep phase of the nodeIPPoolReconcile process, should the CNI be invoked to updateHostNetwork for the removed IPs?
  • see issue

Environment:

  • Kubernetes version (use kubectl version):
  • CNI Version: image: 602401143452.dkr.ecr.us-west-1.amazonaws.com/amazon-k8s-cni-init:v1.15.3-eksbuild.1
  • OS (e.g: cat /etc/os-release):
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
  • Kernel (e.g. uname -a):
Linux ....compute.internal 5.10.198-187.748.amzn2.x86_64 #1 SMP Tue Oct 24 19:49:54 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

@AbeOwlu what is this "external event" that reclaims an IP on an ENI? Only the IPAM daemon should be assigning and unassigning IPs to an ENI. Before calling the EC2 API to unassign IPs, it removes those IPs from the datastore. That precondition is required to avoid this exact scenario

There's an automation pipeline that's incorrectly, (I might add) seeing a drift in the VPC network and unassigns an IP from an EC2 instance at the moment.

  • looking into this further, it actually appears to show the CRI attempting to recreate container sandbox, but the CNI was not not responsive.. connection refused on the 3 attempts so the orchestrator may may be handling this case.

Will update with more details and logs...

I think I hit this issue too. Let me circle back with some more info

We had this issue. aws/amazon-vpc-resource-controller-k8s#412 which deleted branch ENI from pods. CNI didn't do anything about the missing network interface or lost IP address

@AbeOwlu - CNI will not remove any interface that doesn't manage. For any external changes introduced to the interfaces that CNI manages, if they are not in use, it will garbage collect them. If it didn't happen, and you can reproduce this as bug, let us know. Otherwise, we can close this ticket.