Second node using docker0 instead of canal
mellotron opened this issue · 3 comments
I've set up a new Kubernetes cluster with kubeadm on two bare-metal nodes.
The LAN that they sit on is 192.168.1.x/24.
I initialized the first node with:
$ sudo kubeadm init --pod-network-cidr 10.244.0.0/16
And set up Canal with the steps on:
And joined node2 to the cluster with kubeadm
However, when I join the second node, and schedule pods on it, it's using the docker0 interface - node1 is doing the right thing with canal:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system canal-6mtm6 3/3 Running 0 6m 192.168.1.5 node1
kube-system canal-n5db6 3/3 Running 0 5m 192.168.1.6 node2
kube-system etcd-node1 1/1 Running 0 7m 192.168.1.5 node1
kube-system heapster-1428305041-hvbp0 1/1 Running 0 3m 172.17.0.3 node2
kube-system kube-apiserver-node1 1/1 Running 0 7m 192.168.1.5 node1
kube-system kube-controller-manager-node1 1/1 Running 0 7m 192.168.1.5 node1
kube-system kube-dns-3913472980-c0cst 3/3 Running 0 7m 10.244.0.7 node1
kube-system kube-proxy-7s3h5 1/1 Running 0 5m 192.168.1.6 node2
kube-system kube-proxy-ft2p6 1/1 Running 0 7m 192.168.1.5 node1
kube-system kube-scheduler-node1 1/1 Running 0 7m 192.168.1.5 node1
kube-system monitoring-grafana-3975459543-r69tt 1/1 Running 0 3m 172.17.0.2 node2
kube-system monitoring-influxdb-3480804314-9708z 1/1 Running 0 3m 10.244.0.8 node1
On node1, I see:
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether ... brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether ... brd ff:ff:ff:ff:ff:ff
inet 10.244.0.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::.../64 scope link
valid_lft forever preferred_lft forever
9: cali70daf1af12c@if3:
...
While on node2, there's:
3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether ... brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
inet6 fe80::.../64 scope link
valid_lft forever preferred_lft forever
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether ... brd ff:ff:ff:ff:ff:ff
inet 10.244.1.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::.../64 scope link
valid_lft forever preferred_lft forever
5: cni0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether ... brd ff:ff:ff:ff:ff:ff
inet 10.244.1.1/24 scope global cni0
valid_lft forever preferred_lft forever
inet6 fe80::.../64 scope link
valid_lft forever preferred_lft forever
41: veth4e88edb@if40: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UP group default
...
However, the config looks good on node2:
node2 $ cat /etc/cni/net.d/10-calico.conf
{
"name": "k8s-pod-network",
"type": "calico",
"log_level": "info",
"datastore_type": "kubernetes",
"hostname": "node2",
"ipam": {
"type": "host-local",
"subnet": "usePodCidr"
},
"policy": {
"type": "k8s",
"k8s_auth_token": "..."
},
"kubernetes": {
"k8s_api_root": "https://10.96.0.1:443",
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
}
Attempted removing docker0 and redeploying something on docker0, got:
Error syncing pod, skipping: failed to "CreatePodSandbox" for "monitoring-influxdb-3480804314-msh2t_kube-system(4e2d8a46-2fe7-11e7-882e-00232478ac64)" with CreatePodSandboxError: "CreatePodSandbox for pod \"monitoring-influxdb-3480804314-msh2t_kube-system(4e2d8a46-2fe7-11e7-882e-00232478ac64)\" failed: rpc error: code = 2 desc = failed to start sandbox container for pod \"monitoring-influxdb-3480804314-msh2t\": Error response from daemon: {\"message\":\"failed to create endpoint k8s_POD_monitoring-influxdb-3480804314-msh2t_kube-system_4e2d8a46-2fe7-11e7-882e-00232478ac64_8 on network bridge: adding interface vethca446a1 to bridge docker0 failed: could not find bridge docker0: route ip+net: no such network interface\"}"
@mellotron can you verify in the logs that the Pod monitoring-influxdb-3480804314-msh2t
was launched using the Calico CNI plugin?
It is probably worth verifying that the kubelet on node2 is configured with --network-plugin=cni
and not --network-plugin=kubenet
.
The Calico plugin should be used, and it should never try to connect a container to any bridge, so that's suspicious to me. Sounds like the kubelet might be trying to launch that Pod with another CNI plugin for some reason.
This was my fault, I removed KUBELET_NETWORK_ARGS for another bug and kubeadm reset doesn't remove the systemd kubelet files so kubeadm init doesn't write a new one over the changed one. Sorry about that.