tkestack/galaxy

galaxy-k8s-vlan ipvlan mode not work

Closed this issue · 2 comments

my test yaml:

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: common-nginx
  labels:
    app: common-nginx
spec:
  replicas: 6
  selector:
    matchLabels:
      app: common-nginx
  template:
    metadata:
      name: common-nginx
      labels:
        app: common-nginx
      annotations:
        k8s.v1.cni.cncf.io/networks: "galaxy-k8s-vlan"
    spec:
      containers:
      - name: nginx
        image: registry.tcnp.com/library/nginx
        resources:
          requests:
            tke.cloud.tencent.com/eni-ip: "1"
          limits:
            tke.cloud.tencent.com/eni-ip: "1"

my galaxy config

[root@localhost ~]# kubectl -n kube-system get cm galaxy-etc -o yaml
apiVersion: v1
data:
  galaxy.json: |
    {
      "NetworkConf":[
        {"name":"tke-route-eni","type":"tke-route-eni","eni":"eth1","routeTable":1},
        {"name":"galaxy-flannel","type":"galaxy-flannel", "delegate":{"type":"galaxy-veth"},"subnetFile":"/run/flannel/subnet.env"},
        {"name":"galaxy-k8s-vlan","type":"galaxy-k8s-vlan", "device":"eth0", "switch":"ipvlan"},
        {"name":"galaxy-k8s-sriov","type": "galaxy-k8s-sriov", "device": "eth0", "vf_num": 10}
      ],
      "DefaultNetworks": ["galaxy-flannel"],
      "ENIIPNetwork": "galaxy-k8s-vlan"
    }

my floatingip config

[root@localhost ~]# kubectl -n kube-system get cm floatingip-config  -o yaml
apiVersion: v1
data:
  floatingips: '[{"nodeSubnets":["192.168.104.0/24"],"ips":["192.168.104.130~192.168.104.180"],"subnet":"192.168.104.0/24","gateway":"192.168.104.254"}]'
kind: ConfigMap

pod already have ip in 192.168.104.0/24 cidr

[root@localhost ~]# kubectl get pod -o wide
NAME                 READY   STATUS    RESTARTS   AGE   IP                NODE              NOMINATED NODE   READINESS GATES
common-nginx-c7d8f   1/1     Running   0          13h   192.168.104.131   192.168.104.111   <none>           <none>
common-nginx-ftpcf   1/1     Running   0          13h   192.168.104.153   192.168.104.128   <none>           <none>
common-nginx-gk8ss   1/1     Running   0          13h   192.168.104.158   192.168.104.111   <none>           <none>
common-nginx-lwh2p   1/1     Running   0          13h   192.168.104.130   192.168.104.111   <none>           <none>
common-nginx-q8mq8   1/1     Running   0          13h   192.168.104.133   192.168.104.111   <none>           <none>
common-nginx-z85cj   1/1     Running   0          13h   192.168.104.142   192.168.104.111   <none>           <none>

my host node gateway is 192.168.104.254

[root@localhost ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.104.254 0.0.0.0         UG    0      0        0 eth0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.20.1.0      172.20.1.0      255.255.255.0   UG    0      0        0 flannel.1
172.20.2.0      172.20.2.0      255.255.255.0   UG    0      0        0 flannel.1
192.168.104.0   0.0.0.0         255.255.255.0   U     0      0        0 eth0

use nsenter to enter pod common-nginx-q8mq8 namespace with ip 192.168.104.133, can't ping gateway 192.168.104.254 ok.

[root@localhost ~]# e common-nginx-q8mq8 default 
entering pod netns for default/common-nginx-q8mq8
nsenter -n --target 8733
[root@localhost ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.104.254 0.0.0.0         UG    0      0        0 eth0
192.168.104.0   0.0.0.0         255.255.255.0   U     0      0        0 eth0
[root@localhost ~]# ping 192.168.104.254
PING 192.168.104.254 (192.168.104.254) 56(84) bytes of data.
^C
--- 192.168.104.254 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 999ms

[root@localhost ~]# ping -c 3 192.168.104.254
PING 192.168.104.254 (192.168.104.254) 56(84) bytes of data.


--- 192.168.104.254 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2000ms

OS version: CentOS Linux release 7.5.1804 (Core)
kernel version: 4.4.236-1.el7.elrepo.x86_64
k8s version: v1.18.3
galaxy version: v1.0.4
galaxy-ipam version: v1.0.4

Talked offline. Galaxy has to send gratuitous arp request in host network namespace since ipvlan l3 device is NOARP that it delegates master device to ensure L2 connectivity. But gratuitous arp can't guarantee ARP cache exists forever. Alternatively, it may require additional configuration of hardware switch to let it know pod ip is on this machine.
Maybe its better to use ipvlan l2 mode if k8s service is not required.

closed by #91