`calico-kube-controllers` fails to reach `10.96.0.1:443` on startup
Bolodya1997 opened this issue · 3 comments
Environment
- Calico/VPP version:
v0.16.0-calicov3.20.0
. - Kubernetes version:
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T17:56:19Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-12T14:12:29Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}
- Deployment type: bare-metal on
equinix.metal
, host nodes OS isUbuntu 20.04 LTS
. - Network configuration:
control plane node
---
$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
link/ether e4:43:4b:6d:60:40 brd ff:ff:ff:ff:ff:ff
4: eno3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
link/ether e4:43:4b:6d:60:40 brd ff:ff:ff:ff:ff:ff
5: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether e4:43:4b:6d:60:43 brd ff:ff:ff:ff:ff:ff
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether e4:43:4b:6d:60:40 brd ff:ff:ff:ff:ff:ff
inet 147.75.38.85/31 brd 255.255.255.255 scope global bond0
valid_lft forever preferred_lft forever
inet 10.99.35.131/31 brd 255.255.255.255 scope global bond0:0
valid_lft forever preferred_lft forever
inet6 2604:1380:0:2c00::3/127 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e643:4bff:fe6d:6040/64 scope link
valid_lft forever preferred_lft forever
7: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:ee:ab:db:8d brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
8: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN group default qlen 1000
link/ether e4:43:4b:6d:60:41 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.1/30 brd 10.0.0.3 scope global eno2
valid_lft forever preferred_lft forever
inet6 fe80::e643:4bff:fe6d:6041/64 scope link
valid_lft forever preferred_lft forever
worker node
---
$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
link/ether e4:43:4b:5f:6d:50 brd ff:ff:ff:ff:ff:ff
4: eno3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP group default qlen 1000
link/ether e4:43:4b:5f:6d:50 brd ff:ff:ff:ff:ff:ff
5: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether e4:43:4b:5f:6d:53 brd ff:ff:ff:ff:ff:ff
6: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether e4:43:4b:5f:6d:50 brd ff:ff:ff:ff:ff:ff
inet 147.75.75.133/31 brd 255.255.255.255 scope global bond0
valid_lft forever preferred_lft forever
inet 10.99.35.129/31 brd 255.255.255.255 scope global bond0:0
valid_lft forever preferred_lft forever
inet6 2604:1380:0:2c00::1/127 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e643:4bff:fe5f:6d50/64 scope link
valid_lft forever preferred_lft forever
7: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:15:76:3c:b8 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
8: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN group default qlen 1000
link/ether e4:43:4b:5f:6d:51 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.2/30 brd 10.0.0.3 scope global eno2
valid_lft forever preferred_lft forever
inet6 fe80::e643:4bff:fe5f:6d51/64 scope link
valid_lft forever preferred_lft forever
Control plane node eno2
and worker node eno2
are in the same untagged VLAN.
Issue description
After applying calico-vpp-nohuge.yaml
with additionally configured
vpp_dataplane_interface: eno2
calico-kube-controllers
loops in a CrashLoopBackOff
with the following error in logs:
$ kubectl -n kube-system logs calico-kube-controllers-58497c65d5-xm54w
2021-08-31 08:12:45.578 [INFO][1] main.go 94: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0831 08:12:45.580376 1 client_config.go:615] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2021-08-31 08:12:45.581 [INFO][1] main.go 115: Ensuring Calico datastore is initialized
2021-08-31 08:12:55.581 [ERROR][1] client.go 261: Error getting cluster information config ClusterInformation="default" error=Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
2021-08-31 08:12:55.581 [FATAL][1] main.go 120: Failed to initialize Calico datastore error=Get "https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default": context deadline exceeded
To Reproduce
Steps to reproduce the behavior:
- Setup 2
n2.xlarge.x86
servers withUbuntu 20.04 LTS
on https://metal.equinix.com/.- or just probably setup 2 bare-metal servers with
Ubuntu 20.04 LTS
.
- or just probably setup 2 bare-metal servers with
- Configure both of them to have:
- Public IPv4 address on one interface (here it is
bond0
with147.75.38.85/31
for control plane node, with147.75.75.133/31
for worker node). - Local IPv4 addresses in the same subnet on another interface (here it is
eno2
with10.0.0.1/30
for control plane node, with10.0.0.2/30
for worker node).- on equinix metal this can be done with single VLAN assigned to the corresponding interfaces.
- Public IPv4 address on one interface (here it is
- Configure docker on both nodes:
#!/bin/bash
mkdir -p /etc/docker
echo \
'{
"exec-opts": ["native.cgroupdriver=systemd"]
}' >/etc/docker/daemon.json
- Install environment on both nodes:
#!/bin/sh
KUBERNETES_VERSION=1.21.1-00
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y docker.io
apt-get install -qy kubelet="${KUBERNETES_VERSION}" kubectl="${KUBERNETES_VERSION}" kubeadm="${KUBERNETES_VERSION}"
systemctl daemon-reload
systemctl restart kubelet
swapoff --all
- Start kubernetes cluster on control plane node:
#!/bin/sh
set -e
K8S_DIR=$(dirname "$0")
KUBERNETES_INIT_VERSION=1.21.1
kubeadm init \
--kubernetes-version "${KUBERNETES_INIT_VERSION}" \
--pod-network-cidr=192.168.0.0/16 \
--skip-token-print \
--apiserver-advertise-address=147.75.38.85 # use here your control plane node public IP address
mkdir -p "$HOME"/.kube
sudo cp -f /etc/kubernetes/admin.conf "$HOME"/.kube/config
sudo chown "$(id -u):$(id -g)" "$HOME"/.kube/config
kubectl taint nodes --all node-role.kubernetes.io/master-
kubeadm token create --print-join-command > "${K8S_DIR}/join-cluster.sh"
- Copy
join-cluster.sh
script from control plane node to worker node and run it. - Setup control plane node to use
10.0.0.1
as node IP:
#!/bin/sh
sed -Ei 's/(.*)"/\1 --node-ip=10\.0\.0\.1"/g' /var/lib/kubelet/kubeadm-flags.env
systemctl restart kubelet
- Setup worker node to use
10.0.0.2
as node IP:
#!/bin/sh
sed -Ei 's/(.*)"/\1 --node-ip=10\.0\.0\.2"/g' /var/lib/kubelet/kubeadm-flags.env
systemctl restart kubelet
- Copy
~/.kube/config
from the control plane node to your own host. - Edit
calico-vpp-nohuge.yaml
with:
vpp_dataplane_interface: eno2 # use here interface name used with local IPv4 address on nodes
- Run
kubectl apply -f calico-vpp-nohuge.yaml
from your own host.
Expected behavior
All calico pods should start running (probably with few restarts).
Additional context
- Same setup with
weave
as CNI works correctly. - Control plane node is able to reach
10.96.0.1:443
:
$ nc -vw 2 10.96.0.1 443
Connection to 10.96.0.1 443 port [tcp/https] succeeded!
- Worker node is able to reach
10.96.0.1:443
:
$ nc -vw 2 10.96.0.1 443
Connection to 10.96.0.1 443 port [tcp/https] succeeded!
- None of pods with
hostNetwork: false
fromkube-system
is able to reach10.96.0.1:443
:
$ kubectl -n kube-system exec alpine -- nc -vw 2 10.96.0.1 443
nc: 10.96.0.1 (10.96.0.1:443): Operation timed out
command terminated with exit code 1
- Kubectl get:
$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master-1 Ready control-plane,master 177m v1.21.1 10.0.0.1 <none> Ubuntu 20.04.3 LTS 5.4.0-81-generic docker://20.10.7
worker Ready <none> 176m v1.21.1 10.0.0.2 <none> Ubuntu 20.04.3 LTS 5.4.0-81-generic docker://20.10.7
$ kubectl -n calico-vpp-dataplane get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-vpp-node-bt5md 2/2 Running 0 174m 10.0.0.2 worker <none> <none>
calico-vpp-node-ssbr4 2/2 Running 0 174m 10.0.0.1 master-1 <none> <none>
$ kubectl -n kube-system get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alpine 1/1 Running 0 12m 192.168.171.68 worker <none> <none>
calico-kube-controllers-58497c65d5-xm54w 0/1 CrashLoopBackOff 39 3h2m 192.168.171.67 worker <none> <none>
calico-node-4hws2 1/1 Running 0 3h2m 10.0.0.2 worker <none> <none>
calico-node-xrv7h 1/1 Running 0 3h2m 10.0.0.1 master-1 <none> <none>
coredns-558bd4d5db-dj9qx 0/1 Running 0 3h6m 192.168.171.65 worker <none> <none>
coredns-558bd4d5db-xsqdf 0/1 Running 0 3h6m 192.168.171.66 worker <none> <none>
etcd-master-1 1/1 Running 0 3h6m 10.0.0.1 master-1 <none> <none>
kube-apiserver-master-1 1/1 Running 0 3h6m 10.0.0.1 master-1 <none> <none>
kube-controller-manager-master-1 1/1 Running 0 3h6m 10.0.0.1 master-1 <none> <none>
kube-proxy-rpdrr 1/1 Running 0 3h6m 10.0.0.2 worker <none> <none>
kube-proxy-s74x9 1/1 Running 0 3h6m 10.0.0.1 master-1 <none> <none>
kube-scheduler-master-1 1/1 Running 0 3h6m 10.0.0.1 master-1 <none> <none>
$ kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3h10m
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 3h10m
- Both
coredns
pods fail to get ready with the following logs:
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.0
linux/amd64, go1.15.3, 054c9ae
[ERROR] plugin/errors: 2 6061867283059196170.1749449744353646990. HINFO: read udp 192.168.171.65:60256->147.75.207.207:53: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[ERROR] plugin/errors: 2 6061867283059196170.1749449744353646990. HINFO: read udp 192.168.171.65:49716->147.75.207.208:53: i/o timeout
[ERROR] plugin/errors: 2 6061867283059196170.1749449744353646990. HINFO: read udp 192.168.171.65:59029->147.75.207.208:53: i/o timeout
[ERROR] plugin/errors: 2 6061867283059196170.1749449744353646990. HINFO: read udp 192.168.171.65:54880->147.75.207.208:53: i/o timeout
[ERROR] plugin/errors: 2 6061867283059196170.1749449744353646990. HINFO: read udp 192.168.171.65:39632->147.75.207.208:53: i/o timeout
[ERROR] plugin/errors: 2 6061867283059196170.1749449744353646990. HINFO: read udp 192.168.171.65:38548->147.75.207.208:53: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[ERROR] plugin/errors: 2 6061867283059196170.1749449744353646990. HINFO: read udp 192.168.171.65:60599->147.75.207.207:53: i/o timeout
[ERROR] plugin/errors: 2 6061867283059196170.1749449744353646990. HINFO: read udp 192.168.171.65:45834->147.75.207.208:53: i/o timeout
I0831 05:59:55.406322 1 trace.go:205] Trace[469339106]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156 (31-Aug-2021 05:59:25.405) (total time: 30000ms):
Trace[469339106]: [30.000806943s] [30.000806943s] END
E0831 05:59:55.406381 1 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
I0831 05:59:55.406397 1 trace.go:205] Trace[436340495]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156 (31-Aug-2021 05:59:25.405) (total time: 30000ms):
Trace[436340495]: [30.000800448s] [30.000800448s] END
E0831 05:59:55.406443 1 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
I0831 05:59:55.406501 1 trace.go:205] Trace[774965466]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156 (31-Aug-2021 05:59:25.405) (total time: 30000ms):
Trace[774965466]: [30.000768124s] [30.000768124s] END
E0831 05:59:55.406562 1 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[ERROR] plugin/errors: 2 6061867283059196170.1749449744353646990. HINFO: read udp 192.168.171.65:56910->147.75.207.208:53: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[ERROR] plugin/errors: 2 6061867283059196170.1749449744353646990. HINFO: read udp 192.168.171.65:41558->147.75.207.208:53: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
I0831 06:00:26.444201 1 trace.go:205] Trace[443632888]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156 (31-Aug-2021 05:59:56.443) (total time: 30000ms):
Trace[443632888]: [30.000714943s] [30.000714943s] END
E0831 06:00:26.444300 1 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
I0831 06:00:26.663892 1 trace.go:205] Trace[1496193015]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156 (31-Aug-2021 05:59:56.663) (total time: 30000ms):
Trace[1496193015]: [30.000666081s] [30.000666081s] END
E0831 06:00:26.663926 1 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
I0831 06:00:26.898441 1 trace.go:205] Trace[60780408]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156 (31-Aug-2021 05:59:56.897) (total time: 30000ms):
Trace[60780408]: [30.00063881s] [30.00063881s] END
E0831 06:00:26.898474 1 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
I0831 06:00:58.754593 1 trace.go:205] Trace[1304066831]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156 (31-Aug-2021 06:00:28.752) (total time: 30002ms):
Trace[1304066831]: [30.002491082s] [30.002491082s] END
E0831 06:00:58.754645 1 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
I0831 06:00:59.102994 1 trace.go:205] Trace[170625356]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156 (31-Aug-2021 06:00:29.102) (total time: 30000ms):
Trace[170625356]: [30.000669079s] [30.000669079s] END
E0831 06:00:59.103055 1 reflector.go:127] pkg/mod/k8s.io/client-go@v0.19.2/tools/cache/reflector.go:156: Failed to watch *v1.Namespace: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
...
I am not sure if it should be located here or under projectcalico/calico
, so please correct me if I am wrong.
Hi @Bolodya1997 , you're in the right place 🙂
Thanks for the detailed report, this is very helpful! I noticed you used a public IP for the apiserver (--apiserver-advertise-address=147.75.38.85
in kubeadm init). Could you try with the IP address configured on eno2 on the master (10.0.0.1)?
If that doesn't fix the issue, it would be helpful if you could install calivppctl and attach the output of calivppctl export
: https://docs.projectcalico.org/maintenance/troubleshoot/vpp
Thank you!
I was afraid that k8s API server is listening only on the IP address given in --apiserver-advertise-address=
, but it actually listens on all IP addresses.
So after changing this property to 10.0.0.1
it becomes only a question how to make TLS certificates work for the public IP, but it can be easy solved with this - https://blog.scottlowe.org/2019/07/30/adding-a-name-to-kubernetes-api-server-certificate/.