aws/aws-network-policy-agent

Startup probe is failing due to NetPol blocks nodeport ip

Closed this issue · 7 comments

Hello,

I've encountered an issue with NetPol. It blocks the startup probe even after upgrading the VPC CNI version to 1.15.1, and my agent version is 1.0.4. Unfortunately, my startup probe is still failing, and it appears to be blocked by a network policy. Bcz when I delete netpol, startup probe is passed. Is there something I might be missing or overlooking?

Thanks in advance

Startup probe failed: Get "http://10.0.30.183:8080/actuator/health/liveness": dial tcp 10.0.30.183:8080: connect: connection refused

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  labels:
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: service-network-policy
  name: service
  namespace: dev
spec:
  egress:
    - ports:
        - port: 53
          protocol: UDP
      to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
    - ports:
        - port: 4317
          protocol: TCP
      to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: opentelemetry
          podSelector:
            matchLabels:
              app.kubernetes.io/component: opentelemetry-collector
    - ports:
        - port: 443
          protocol: TCP
      to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: default
          podSelector:
            matchLabels:
              component: apiserver
              provider: kubernetes
    - ports:
        - port: 8080
          protocol: TCP
      to:
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: maintenance-service
    - ports:
        - port: 8080
          protocol: TCP
      to:
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: metric-service
  podSelector:
    matchLabels:
      app.kubernetes.io/name: service
  policyTypes:
    - Ingress
    - Egress

@atilsensalduz All traffic from Node IP will be allowed and we derive Node IP via IMDS. Unless IMDS access is blocked for aws-node pod, we don't see a reason for the above failure. Can you share the network policy agent logs located @ /var/log/aws-routed-eni/network-policy-agent.log. You can mail the logs to k8s-awscni-triage@amazon.com

Hi @achevuru,

I wanted to express my appreciation for your assistance. I've sent an email as well.

The IP mentioned in the Unhealthy event in the Kubernetes pod logs is pod ip. But in fact, pod ip is also one of node ips. right? This shouldn't be causing a block, correct?

Startup probe failed: Get "http://10.0.6.216:8080/actuator/health/liveness": dial tcp 10.0.6.216:8080: connect: connection refused

The NodeIP mentioned by @achevuru is the eth0 IP of the node. Unless the pod is a host networking pod, the IP will be from one of the secondary IPs. You will have to allow the pod traffic in your NP but we will check the logs and get back.

@atilsensalduz Can you enable access logs and collect the network policy agent logs again? Also, can you do a describe on policyendpoints resources tied to the above network policy? (they will have the same name service-network-policy-****)

Hi @achevuru and @jayanthvn, I've sent the access logs and a description of the policy endpoint via email.

I've come across a deny log associated with my pod's IP. It seems to be blocking traffic to the Kubernetes service in the default namespace, even though the network policy definition includes the following rule 🤔

- ports:
  - port: 443
    protocol: TCP
  to:
  - namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: default
    podSelector:
      matchLabels:
        component: apiserver
        provider: kubernetes

{"level":"info","ts":"2023-10-25T16:24:32.231Z","logger":"ebpf-client","msg":"Flow Info: ","Src IP":"10.0.15.252","Src Port":48586,"Dest IP":"172.20.0.1","Dest Port":443,"Proto":"TCP","Verdict":"DENY"}

@atilsensalduz We will not be able to select API Server pods on EKS clusters using Pod selector labels. So, the above policy you have will not allow traffic to API Server endpoints. Kubernetes service VIP will be the .1 IP in your cluster's service CIDR and so you would've to explicitly specify that under IP Cidr field if you want to selectively enable traffic to API server.

Thanks, everyone! The issue was resolved when I updated the network policy rule to use the IP address of the Kubernetes service directly. However, I'm curious if there's another method to access a Kubernetes service without relying on the IP address. Any insights would be appreciated!