gravitational/wormhole

[BUG] Service CIDR not added to wg config

stobias123 opened this issue · 3 comments

Describe the bug
I cannot access coredns service in the kube cluster.

To Reproduce

  1. Init a kube cluster like so
kubeadm init --pod-network-cidr="10.75.0.0/16" --service-cidr="10.76.0.0/16" --apiserver-advertise-address x.x.x.x.x --ignore-preflight-errors=numcpu
  1. Join a node.
  2. Schedule a pod on the node.
  3. Try to resolve DNS.

Expected behavior
I expect DNS to resolve.

I am able to ping pods on the master node, but unable to access the k8s coredns service.

Logs
If applicable, add logs to help explain your problem.

$ k get po -o wide -n default
NAME                                   READY   STATUS    RESTARTS   AGE    IP           NODE           NOMINATED NODE   READINESS GATES
master-deployment-74dc87d469-l6dsw        1/1     Running   0          6m6s   10.75.0.14   master-k8s   <none>           <none>
node-deployment-8474cfc7bb-w7cbv   1/1     Running   0          6m     10.75.1.13   node-k8s <none>           <none>

POD1 (scheduled on master)

root@master-deployment-74dc87d469-l6dsw:/# host -v google.com
Trying "google.com.default.svc.cluster.local"
Trying "google.com.svc.cluster.local"
Trying "google.com.cluster.local"
Trying "google.com"
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62988
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:

root@master-deployment-74dc87d469-l6dsw:/# ping 10.75.1.13
PING 10.75.1.13 (10.75.1.13) 56(84) bytes of data.
64 bytes from 10.75.1.13: icmp_seq=1 ttl=62 time=11.2 ms
^C

POD2 (scheduled on node)

root@node-deployment-8474cfc7bb-w7cbv:/# host -v google.com
Trying "google.com.default.svc.cluster.local"
;; connection timed out; no servers could be reached
root@node-deployment-8474cfc7bb-w7cbv:/# ping 10.75.0.14
PING 10.75.0.14 (10.75.0.14) 56(84) bytes of data.
64 bytes from 10.75.0.14: icmp_seq=1 ttl=62 time=11.7 ms
64 bytes from 10.75.0.14: icmp_seq=2 ttl=62 time=11.2 ms
^C
--- 10.75.0.14 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 11.209/11.485/11.762/0.296 ms
root@node-deployment-8474cfc7bb-w7cbv:/#

Versions (please complete the following information):

  • OS: [debian]
  • Kubernetes [e.g. v1.14.3]

Additional context
Add any other context about the problem here.

Hi @stobias123 thanks for the report. I quickly tried to reproduce with kubeadm 1.14.3 and am unable to do so.

The service CIDR isn't actually expected to be part of the WireGuard configuration. The way the service cidr works, is each node configures NAT rules, and then as pods or the host generate packets towards the service network, the node will NAT the traffic to a pod, and then the overlay network will route the packets towards the pod the NAT rules have load balanced to. Depending on configuration the load balancing is done in iptables or ipvs.

So trying to reproduce:

Cluster Setup

kubeadm init --pod-network-cidr="10.75.0.0/16" --service-cidr="10.76.0.0/16" --apiserver-advertise-address x.x.x.x --ignore-preflight-errors=numcpu
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/gravitational/wormhole/master/docs/kube-wormhole.yaml

Nodes and Cluster DNS

root@kevin-test5:~# kubectl get nodes
NAME          STATUS   ROLES    AGE     VERSION
kevin-test3   Ready    <none>   96s     v1.14.3
kevin-test4   Ready    <none>   98s     v1.14.3
kevin-test5   Ready    master   2m23s   v1.14.3


kubectl -n kube-system get svc
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.76.0.10   <none>        53/UDP,53/TCP,9153/TCP   2m28s

From the host of the master:

root@kevin-test5:~# dig @10.76.0.10 google.ca

; <<>> DiG 9.10.3-P4-Ubuntu <<>> @10.76.0.10 google.ca
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 63406
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.ca.			IN	A

;; ANSWER SECTION:
google.ca.		30	IN	A	172.217.13.195

;; Query time: 31 msec
;; SERVER: 10.76.0.10#53(10.76.0.10)
;; WHEN: Wed Jun 19 04:55:36 UTC 2019
;; MSG SIZE  rcvd: 63

Deploy some test pods:

root@kevin-test5:~# kubectl run centos --image=centos --replicas=9 -- bash -c "sleep 86400"
root@kevin-test5:~# kubectl get po -o wide
NAME                      READY   STATUS    RESTARTS   AGE     IP           NODE          NOMINATED NODE   READINESS GATES
centos-598f55cd46-2frxh   1/1     Running   0          6m39s   10.75.2.16   kevin-test3   <none>           <none>
centos-598f55cd46-72lkv   1/1     Running   0          6m39s   10.75.2.15   kevin-test3   <none>           <none>
centos-598f55cd46-88lh7   1/1     Running   0          6m39s   10.75.1.22   kevin-test4   <none>           <none>
centos-598f55cd46-gv7tm   1/1     Running   0          6m39s   10.75.1.20   kevin-test4   <none>           <none>
centos-598f55cd46-mrm2p   1/1     Running   0          6m39s   10.75.2.14   kevin-test3   <none>           <none>
centos-598f55cd46-nj2mz   1/1     Running   0          6m39s   10.75.1.21   kevin-test4   <none>           <none>
centos-598f55cd46-qrdn4   1/1     Running   0          6m39s   10.75.2.17   kevin-test3   <none>           <none>
centos-598f55cd46-rq9lr   1/1     Running   0          6m39s   10.75.1.19   kevin-test4   <none>           <none>
centos-598f55cd46-sxj5g   1/1     Running   0          6m39s   10.75.1.23   kevin-test4   <none>           <none>

Try from within a pod:

kubectl exec -it centos-598f55cd46-2frxh bash
[root@centos-598f55cd46-2frxh /]# ping -c1 google.ca
PING google.ca (172.217.13.195) 56(84) bytes of data.
64 bytes from yul03s05-in-f3.1e100.net (172.217.13.195): icmp_seq=1 ttl=58 time=0.605 ms

--- google.ca ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.605/0.605/0.605/0.000 ms
[root@centos-598f55cd46-2frxh /]# cat /etc/resolv.conf
nameserver 10.76.0.10
search default.svc.cluster.local svc.cluster.local cluster.local c.kubeadm-167321.internal google.internal
options ndots:5

So I'm not sure why this might not be working based on the information you have provided. I would look into whether kubernetes is setting up the NAT for the DNS service, and seeing why those rules don't seem to be getting invoked. On my system the iptables rules for kube-dns look like this, and these rules should be present on all nodes in the cluster:

root@kevin-test5:~# iptables-save | grep -e kube-dns -e KUBE-SVC-TCOU7JCQXEZGVUNU -e KUBE-SVC-ERIFXISQEP7F7OF4 -e KUBE-SVC-JD5MR3NA4I4DYORP -e KUBE-SEP-L7RB4ZVBMZD44RQS -e KUBE-SEP-2MWWTJTZRAGSAFSS -e KUBE-SEP-SDWX5KZ4DX25KGP7 -e KUBE-SEP-LJXRIHI2QMWQQFA2 -e KUBE-SEP-XAXO2YXE2AA3KJFH -e KUBE-SEP-UB46T74A7HLUOOSR
:KUBE-SEP-2MWWTJTZRAGSAFSS - [0:0]
:KUBE-SEP-L7RB4ZVBMZD44RQS - [0:0]
:KUBE-SEP-LJXRIHI2QMWQQFA2 - [0:0]
:KUBE-SEP-SDWX5KZ4DX25KGP7 - [0:0]
:KUBE-SEP-UB46T74A7HLUOOSR - [0:0]
:KUBE-SEP-XAXO2YXE2AA3KJFH - [0:0]
:KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
:KUBE-SVC-JD5MR3NA4I4DYORP - [0:0]
:KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
-A KUBE-SEP-2MWWTJTZRAGSAFSS -s 10.75.1.18/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-2MWWTJTZRAGSAFSS -p tcp -m tcp -j DNAT --to-destination 10.75.1.18:53
-A KUBE-SEP-L7RB4ZVBMZD44RQS -s 10.75.1.17/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-L7RB4ZVBMZD44RQS -p tcp -m tcp -j DNAT --to-destination 10.75.1.17:53
-A KUBE-SEP-LJXRIHI2QMWQQFA2 -s 10.75.1.18/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-LJXRIHI2QMWQQFA2 -p tcp -m tcp -j DNAT --to-destination 10.75.1.18:9153
-A KUBE-SEP-SDWX5KZ4DX25KGP7 -s 10.75.1.17/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-SDWX5KZ4DX25KGP7 -p tcp -m tcp -j DNAT --to-destination 10.75.1.17:9153
-A KUBE-SEP-UB46T74A7HLUOOSR -s 10.75.1.18/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-UB46T74A7HLUOOSR -p udp -m udp -j DNAT --to-destination 10.75.1.18:53
-A KUBE-SEP-XAXO2YXE2AA3KJFH -s 10.75.1.17/32 -j KUBE-MARK-MASQ
-A KUBE-SEP-XAXO2YXE2AA3KJFH -p udp -m udp -j DNAT --to-destination 10.75.1.17:53
-A KUBE-SERVICES ! -s 10.75.0.0/16 -d 10.76.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.76.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
-A KUBE-SERVICES ! -s 10.75.0.0/16 -d 10.76.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.76.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:metrics cluster IP" -m tcp --dport 9153 -j KUBE-SVC-JD5MR3NA4I4DYORP
-A KUBE-SERVICES ! -s 10.75.0.0/16 -d 10.76.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.76.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
-A KUBE-SVC-ERIFXISQEP7F7OF4 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-L7RB4ZVBMZD44RQS
-A KUBE-SVC-ERIFXISQEP7F7OF4 -j KUBE-SEP-2MWWTJTZRAGSAFSS
-A KUBE-SVC-JD5MR3NA4I4DYORP -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-SDWX5KZ4DX25KGP7
-A KUBE-SVC-JD5MR3NA4I4DYORP -j KUBE-SEP-LJXRIHI2QMWQQFA2
-A KUBE-SVC-TCOU7JCQXEZGVUNU -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-XAXO2YXE2AA3KJFH
-A KUBE-SVC-TCOU7JCQXEZGVUNU -j KUBE-SEP-UB46T74A7HLUOOSR

Hope this helps.

Turns out I had some artifacts from another cni install. This works fine now. Sorry! 😬

Any chance there's a gravitational slack or irc? Curious how I can exec into pods or get logs... Seems to be an issue with NATted nodes

Gravitational Community Slack: https://join.slack.com/t/grv8/shared_invite/enQtNTEwMTU2NjQ2NzU2LWIzYTM1ZDcyNWFhMzZhNDJhN2IyOWMwZDA4NDA4MmE2MjgzYWNiYmY1NjcxYjA3M2RmOTQ1ZjRkOTAwODE1NDY (although I need to create a wormhole channel)

You can also find me floating around in the Kubernetes main slack group as Kevin Nisbet