tkestack/galaxy

Galaxy can not create pod sucess with containerd runtime

Huimintai opened this issue · 2 comments

TKEStack with containerd runtime engine can not create pods sucess with errors:

# kubectl get pods -n kube-system
NAME                                     READY   STATUS              RESTARTS   AGE
coredns-ccc77fb9d-d8vsj                  1/1     Running             0          17h
coredns-ccc77fb9d-l2jn2                  1/1     Running             0          17h
etcd-10.0.32.211                         1/1     Running             0          17h
flannel-rcf9p                            1/1     Running             0          17h
galaxy-daemonset-b88np                   1/1     Running             0          17h
kube-apiserver-10.0.32.211               1/1     Running             0          17h
kube-controller-manager-10.0.32.211      1/1     Running             0          17h
kube-proxy-llkg6                         1/1     Running             0          17h
kube-scheduler-10.0.32.211               1/1     Running             0          17h
metrics-server-v0.3.6-59c66b5dfd-57zz4   2/2     Running             0          17h
metrics-server-v0.3.6-794ccd69c8-6zdrg   0/2     ContainerCreating   0          17h
# kubectl describe pods metrics-server-v0.3.6-794ccd69c8-6zdrg -n kube-system
 Warning  FailedCreatePodSandBox  94s (x4848 over 17h)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "2869e68816e95f2eeaa62f26905c049fd4e70240e5a73bf4c30875610d9c8aef": galaxy returns: fail to establish network map[]:failed to open netns "/var/run/netns/cni-93beae62-e333-54c2-12f3-49069a567f4b": failed to Statfs "/var/run/netns/cni-93beae62-e333-54c2-12f3-49069a567f4b": no such file or directory

The galaxy error log:

I0713 02:08:18.298218  184815 server.go:114] ADD metrics-server-v0.3.6-794ccd69c8-6zdrg_kube-system, 16aedda9d4d54b18301610162a4e4d24397a2455fb49a1b2ec6fe270084e9585, /var/run/netns/cni-aee74e86-14a1-fcae-26bd-e96cdf62fa02, [], Jul 13 02:08:18.298209+
I0713 02:08:18.300783  184815 cni.go:93] delegate add 16aedda9d4d54b18301610162a4e4d24397a2455fb49a1b2ec6fe270084e9585 args K8S_POD_NAME=metrics-server-v0.3.6-794ccd69c8-6zdrg;K8S_POD_INFRA_CONTAINER_ID=16aedda9d4d54b18301610162a4e4d24397a2455fb49a1b2ec6fe270084e9585;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system conf {"delegate":{"type":"galaxy-veth"},"name":"galaxy-flannel","subnetFile":"/run/flannel/subnet.env","type":"galaxy-flannel"}
E0713 02:08:18.335744  184815 cni.go:146] fail to add network map[]: failed to open netns "/var/run/netns/cni-aee74e86-14a1-fcae-26bd-e96cdf62fa02": failed to Statfs "/var/run/netns/cni-aee74e86-14a1-fcae-26bd-e96cdf62fa02": no such file or directory, begin to rollback and delete it
I0713 02:08:18.335898  184815 cni.go:114] delegate del 16aedda9d4d54b18301610162a4e4d24397a2455fb49a1b2ec6fe270084e9585 args K8S_POD_NAME=metrics-server-v0.3.6-794ccd69c8-6zdrg;K8S_POD_INFRA_CONTAINER_ID=16aedda9d4d54b18301610162a4e4d24397a2455fb49a1b2ec6fe270084e9585;IgnoreUnknown=1;K8S_POD_NAMESPACE=kube-system conf {"delegate":{"type":"galaxy-veth"},"name":"galaxy-flannel","subnetFile":"/run/flannel/subnet.env","type":"galaxy-flannel"}
W0713 02:08:18.342705  184815 cni.go:148] fail to delete cni in rollback <nil>

But when I do not install galaxy the metrics-server can runnning well:

root@VM-32-165-ubuntu:~# kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
coredns-ccc77fb9d-qgx2g                    1/1     Running   1          9m46s
coredns-ccc77fb9d-wlb82                    1/1     Running   1          9m46s
etcd-vm-32-165-ubuntu                      1/1     Running   2          10m
kube-apiserver-vm-32-165-ubuntu            1/1     Running   2          10m
kube-controller-manager-vm-32-165-ubuntu   1/1     Running   1          10m
kube-proxy-8sqdn                           1/1     Running   1          9m46s
kube-scheduler-vm-32-165-ubuntu            1/1     Running   7          6m25s
metrics-server-v0.3.6-794ccd69c8-wfv7d     2/2     Running   3          9m32s

Also when I install community flannel the metrics-server also can running well withhout any errors:

# kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
coredns-ccc77fb9d-qgx2g                    1/1     Running   1          13m
coredns-ccc77fb9d-wlb82                    1/1     Running   1          13m
etcd-vm-32-165-ubuntu                      1/1     Running   2          14m
kube-apiserver-vm-32-165-ubuntu            1/1     Running   2          14m
kube-controller-manager-vm-32-165-ubuntu   1/1     Running   1          14m
kube-flannel-ds-2grmk                      1/1     Running   0          69s
kube-proxy-8sqdn                           1/1     Running   1          13m
kube-scheduler-vm-32-165-ubuntu            1/1     Running   7          10m
metrics-server-v0.3.6-794ccd69c8-wfv7d     2/2     Running   3          13m
root@VM-32-165-ubuntu:~#
root@VM-32-165-ubuntu:~#
root@VM-32-165-ubuntu:~# kubectl delete pods metrics-server-v0.3.6-794ccd69c8-wfv7d -n kube-system
pod "metrics-server-v0.3.6-794ccd69c8-wfv7d" deleted
root@VM-32-165-ubuntu:~#
root@VM-32-165-ubuntu:~#
root@VM-32-165-ubuntu:~# kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
coredns-ccc77fb9d-qgx2g                    1/1     Running   1          13m
coredns-ccc77fb9d-wlb82                    1/1     Running   1          13m
etcd-vm-32-165-ubuntu                      1/1     Running   2          14m
kube-apiserver-vm-32-165-ubuntu            1/1     Running   2          14m
kube-controller-manager-vm-32-165-ubuntu   1/1     Running   1          14m
kube-flannel-ds-2grmk                      1/1     Running   0          93s
kube-proxy-8sqdn                           1/1     Running   1          13m
kube-scheduler-vm-32-165-ubuntu            1/1     Running   7          10m
metrics-server-v0.3.6-794ccd69c8-grvvq     2/2     Running   0          15s
root@VM-32-165-ubuntu:~# ls /var/run/netns/
cni-62aefc67-2e1a-3287-bb49-123ffc5eb62a  cni-99a91844-ea0c-96ef-79ae-5b43e1b5aa28
cni-99744035-01d0-1a18-2ec9-4a94c68cf683

This is community flannel CNI:

root@VM-32-165-ubuntu:~# kubectl get pods -n kube-system -o wide
NAME                                       READY   STATUS    RESTARTS   AGE     IP             NODE               NOMINATED NODE   READINESS GATES
coredns-ccc77fb9d-65pn9                    1/1     Running   0          4m25s   10.244.0.226   vm-32-165-ubuntu   <none>           <none>
coredns-ccc77fb9d-gvbf9                    1/1     Running   0          4m25s   10.244.0.227   vm-32-165-ubuntu   <none>           <none>
etcd-vm-32-165-ubuntu                      1/1     Running   2          5h13m   10.0.32.165    vm-32-165-ubuntu   <none>           <none>
kube-apiserver-vm-32-165-ubuntu            1/1     Running   2          5h13m   10.0.32.165    vm-32-165-ubuntu   <none>           <none>
kube-controller-manager-vm-32-165-ubuntu   1/1     Running   1          5h13m   10.0.32.165    vm-32-165-ubuntu   <none>           <none>
kube-flannel-ds-2grmk                      1/1     Running   0          5h1m    10.0.32.165    vm-32-165-ubuntu   <none>           <none>
kube-proxy-8sqdn                           1/1     Running   1          5h13m   10.0.32.165    vm-32-165-ubuntu   <none>           <none>
kube-scheduler-vm-32-165-ubuntu            1/1     Running   7          5h9m    10.0.32.165    vm-32-165-ubuntu   <none>           <none>
metrics-server-v0.3.6-794ccd69c8-rwcx9     2/2     Running   0          4m25s   10.244.0.225   vm-32-165-ubuntu   <none>           <none>
root@VM-32-165-ubuntu:~#
root@VM-32-165-ubuntu:~# ls /var/run/netns/
cni-05418fc3-ad75-1cbb-1705-b1fb687e7b74  cni-0e68f071-ba2b-cd80-1a00-19b97cea41fa  cni-b53efb6b-431f-acd8-9ee3-484ee3dda141

This is community flannel CNI:

root@VM-32-165-ubuntu:~# kubectl get pods -n kube-system -o wide
NAME                                       READY   STATUS    RESTARTS   AGE     IP             NODE               NOMINATED NODE   READINESS GATES
coredns-ccc77fb9d-65pn9                    1/1     Running   0          4m25s   10.244.0.226   vm-32-165-ubuntu   <none>           <none>
coredns-ccc77fb9d-gvbf9                    1/1     Running   0          4m25s   10.244.0.227   vm-32-165-ubuntu   <none>           <none>
etcd-vm-32-165-ubuntu                      1/1     Running   2          5h13m   10.0.32.165    vm-32-165-ubuntu   <none>           <none>
kube-apiserver-vm-32-165-ubuntu            1/1     Running   2          5h13m   10.0.32.165    vm-32-165-ubuntu   <none>           <none>
kube-controller-manager-vm-32-165-ubuntu   1/1     Running   1          5h13m   10.0.32.165    vm-32-165-ubuntu   <none>           <none>
kube-flannel-ds-2grmk                      1/1     Running   0          5h1m    10.0.32.165    vm-32-165-ubuntu   <none>           <none>
kube-proxy-8sqdn                           1/1     Running   1          5h13m   10.0.32.165    vm-32-165-ubuntu   <none>           <none>
kube-scheduler-vm-32-165-ubuntu            1/1     Running   7          5h9m    10.0.32.165    vm-32-165-ubuntu   <none>           <none>
metrics-server-v0.3.6-794ccd69c8-rwcx9     2/2     Running   0          4m25s   10.244.0.225   vm-32-165-ubuntu   <none>           <none>
root@VM-32-165-ubuntu:~#
root@VM-32-165-ubuntu:~# ls /var/run/netns/
cni-05418fc3-ad75-1cbb-1705-b1fb687e7b74  cni-0e68f071-ba2b-cd80-1a00-19b97cea41fa  cni-b53efb6b-431f-acd8-9ee3-484ee3dda141

galaxy use docker's sock to get dockerclient, you can se the go mod github.com/docker/engine-api v0.4.0. it is a better idea to support different cri is to use cri interface.