tkestack/galaxy

Pod liveness and readiness gates failed with ipvlan l2 mode

chenchun opened this issue · 17 comments

Pod liveness and readiness gates failed with ipvlan l2 mode

@chenchun I met this problem when using the floating ip, the pod health check would be failed.
The deployment is:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-floatingip
spec:
  strategy:
    type: Recreate
  replicas: 3
  selector:
    matchLabels:
      app: nginx-floatingip
  template:
    metadata:
      name: nginx-floatingip
      labels:
        app: nginx-floatingip
      annotations:
        k8s.v1.cni.cncf.io/networks: "galaxy-k8s-vlan"
        k8s.v1.cni.galaxy.io/release-policy: "immutable"
    spec:
      tolerations:
        - operator: "Exists"
      containers:
      - name: nginx
        image: nginx:alpine
        ports:
          - name: http-80
            containerPort: 80
        resources:
          requests:
            cpu: "0.1"
            memory: "32Mi"
            tke.cloud.tencent.com/eni-ip: "1"
          limits:
            cpu: "0.1"
            memory: "32Mi"
            tke.cloud.tencent.com/eni-ip: "1"
        livenessProbe:
          # httpGet:
          #   path: /
          #   port: 80
          #   scheme: HTTP
          tcpSocket:
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          failureThreshold: 3
          timeoutSeconds: 1
        readinessProbe:
          # httpGet:
          #   path: /
          #   port: 80
          #   scheme: HTTP
          tcpSocket:
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
          successThreshold: 2
          failureThreshold: 3
          timeoutSeconds: 1

the pod will always restart due to the health check probe failed.
This is the pod describition info:

  Warning  FailedScheduling  82s                default-scheduler  deployment nginx-floatingip has allocated 3 ips with replicas of 3, wait for releasing
  Warning  FailedScheduling  82s                default-scheduler  deployment nginx-floatingip has allocated 3 ips with replicas of 3, wait for releasing
  Normal   Scheduled         78s                default-scheduler  Successfully assigned default/nginx-floatingip-5cdcd7bcbd-6ql2x to 10.177.140.18
  Warning  Unhealthy         16s (x3 over 36s)  kubelet            Liveness probe failed: dial tcp 10.177.140.44:80: i/o timeout

This issue is about galaxy-ipam liveness and readiness gates.
Can you provide more information? Can you ping the pod ip from host network? Can you curl the pod port from inside the pod?

@chenchun
ping and curl both succeed in the pod

[root@k8s-master-01 ~]# kubectl get po -o wide
NAME                                READY   STATUS    RESTARTS   AGE    IP              NODE            NOMINATED NODE   READINESS GATES
nginx-floatingip-c895bbb7f-hs9bk    1/1     Running   1          2d     10.177.140.46   10.177.140.16   <none>           <none>
nginx-floatingip-c895bbb7f-tkl8j    1/1     Running   0          2d     10.177.140.53   10.177.140.18   <none>           <none>
nginx-floatingip-c895bbb7f-tplc9    1/1     Running   1          2d     10.177.140.44   10.177.140.16   <none>           <none>
[root@k8s-master-01 ~]# kubectl exec -it nginx-floatingip-c895bbb7f-hs9bk -- sh
/ # ping 10.177.140.46
PING 10.177.140.46 (10.177.140.46): 56 data bytes
64 bytes from 10.177.140.46: seq=0 ttl=64 time=0.046 ms
64 bytes from 10.177.140.46: seq=1 ttl=64 time=0.070 ms
^C
--- 10.177.140.46 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.046/0.058/0.070 ms
/ # curl 10.177.140.46
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>
/ #

But ping or curl pod both failed if pod scheduled in the node. For example, the pod nginx-floatingip-c895bbb7f-hs9bk scheduled in the 10.177.140.16 node, ping or curl both failed

May I ask what is the underlay network? Is it a VPC or a IDC network?
Lots of vpc network drops packets from unknown mac address. If you use galaxy-k8s-vlan cni, it connects pods with host via a veth pair, thus pods has their own mac addresses.

Yes, I use the galaxy-k8s-vlan in the IDC underlay network. galaxy.json is

    {
      "NetworkConf":[
        {"name":"tke-route-eni","type":"tke-route-eni","eni":"eth1","routeTable":1},
        {"name":"galaxy-flannel","type":"galaxy-flannel", "delegate":{"type":"galaxy-veth"},"subnetFile":"/run/flannel/subnet.env"},
        {"name":"galaxy-k8s-vlan","type":"galaxy-k8s-vlan", "device":"ens192", "switch":"ipvlan", "ipvlan_mode":"l2"},
        {"name":"galaxy-k8s-sriov","type": "galaxy-k8s-sriov", "device": "ens192", "vf_num": 10}
      ],
      "DefaultNetworks": ["galaxy-flannel"],
      "ENIIPNetwork": "galaxy-k8s-vlan"
    }

How to create a veth pair in the pod when using the ipvlan mode which disturbed me very much. I have already turn on promiscuous mode in the host

You mean ping pod on another node or ping the other pod on the same node is unreachable ?

if the pod in the node, this node ping pod is unreachable

moby/moby#21735 (comment) @currycan

Note: In both Macvlan and Ipvlan you are not able to ping or communicate with the default namespace IP address. For example, if you create a container and try to ping the Docker host's eth0 it will not work. That traffic is explicitly filtered by the kernel modules themselves to offer additional provider isolation and security.

The default namespace is not reachable per ipvlan design in order to isolate container namespaces from the underlying host.

@chenchun
If using the floating IP, the pod's livenessProbe and readinessProbe will be unavailable, which will be very terrible.
I get some other information from: https://hansedong.github.io/2019/03/19/14/
But how to create another veth pair in the pod like this:

{
    "name": "cni0",
    "cniVersion": "0.3.1",
    "plugins": [
        {
            "nodename": "k8s-node-2",
            "name": "myipvlan",
            "type": "ipvlan",
            "debug": true,
            "master": "eth0",
            "mode": "l2",
            "ipam": {
                "type": "host-local",
                "subnet": "172.18.12.0/24",
                "rangeStart": "172.18.12.211",
                "rangeEnd": "172.18.12.230",
                "gateway": "172.18.12.1",
                "routes": [
                    {
                        "dst": "0.0.0.0/0"
                    }
                ]
            }
        },
        {
            "name": "ptp",
            "type": "unnumbered-ptp",
            "hostInterface": "eth0",
            "containerInterface": "veth0",
            "ipMasq": true
        }
    ]
}

@currycan I would rather suggest you to use galaxy-underlay-veth instead of ipvlan which is based on proxy_arp.
It's the ideal solution, livenessProbe, readinessProbe and kubernetes service all works.

@chenchun
I changed the mode to galaxy-underlay-veth, and probes work well.
But the network seems something wrong, the domain name can't be resolved in the pod:

/ # nslookup cloud.tencent.com
;; connection timed out; no servers could be reached

/ # cat /etc/resolv.conf
nameserver 172.31.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
/ # route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.177.143.254  0.0.0.0         UG    0      0        0 eth0
10.177.140.0    0.0.0.0         255.255.252.0   U     0      0        0 eth0

Can you try ping 172.31.0.10? and also try to ping coredns pod ip directly?
Is your coredns pod using flannel network? Does the flannel network still work between these two hosts ?

I also suggest you to try running coredns with host network which is more simple and reliable.

Coredns and flannel are running, and coredns is running using the flannel cni. ping coredns cluster ip and pod ip are reachable.
And if running coredns with host network, Do I still need to create a service for coredns?

@chenchun I tested it for a long time and finally found that it was a problem with the dnsPolicy configuration of coreDNS deployment.The value of dnsPolicy must be "default"

So, everything is working now?

@chenchun Yes, thank you very much!