OKD 3.11 - crio - failed to find plugin "loopback" in path [/opt/cni/bin/ /opt/loopback/bin]
tidbits03 opened this issue · 4 comments
I am currently running a large okd: v3.11.0+b750162-89
cluster running crio as the runtime. I am seeing an issue with pods getting stuck in a ContainerCreating
state. Upon investigation it appears pods are getting created before the SDN is fully up and running, once its up however it fails to create the pod.
here is a example:
pod start time: Wed, 20 May 2020 06:35:34 -0400
events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 55m kubelet, ip-X.ec2.internal Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_fluent-bit-962zf_openshift-logging_a2bd2a99-9a85-11ea-a7a8-0287f3ab0050_0(301ea30ea0ad94070c5d934f3d1799cc4225afa865ff498619057bc08ad09b6d): failed to find plugin "loopback" in path [/opt/cni/bin/ /opt/loopback/bin]
Warning FailedCreatePodSandBox 54m kubelet, ip-X.ec2.internal Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_fluent-bit-962zf_openshift-logging_a2bd2a99-9a85-11ea-a7a8-0287f3ab0050_0(99f7918f079c7e829d731926f01351419d931be67d00cac1de20726382c8810f): failed to find plugin "loopback" in path [/opt/cni/bin/ /opt/loopback/bin]
Warning FailedCreatePodSandBox 54m kubelet, ip-X.ec2.internal Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_fluent-bit-962zf_openshift-logging_a2bd2a99-9a85-11ea-a7a8-0287f3ab0050_0(9518362bebc1185be0c9df17e38dfd4931a64cd05aa948f7a75f2796673cc3e5): OpenShift SDN network process is not (yet?) available
Warning FailedCreatePodSandBox 4m (x220 over 54m) kubelet, ip-X.ec2.internal Failed create pod sandbox: rpc error: code = Unknown desc = pod sandbox with name "k8s_fluent-bit-962zf_openshift-logging_a2bd2a99-9a85-11ea-a7a8-0287f3ab0050_0" already exists
SDN start time: Wed, 20 May 2020 06:35:34 -0400
crio logs:
May 20 10:35:52 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:35:52.173892500Z" level=error msg="Error deleting network: failed to find plugin "openshift-sdn" in path [/opt/cni/bin/ /opt/openshift-sdn/bin]"
May 20 10:36:08 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:08.350781469Z" level=error msg="Error adding network: failed to find plugin "loopback" in path [/opt/cni/bin/ /opt/loopback/bin]"
May 20 10:36:08 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:08.350805073Z" level=error msg="Error while adding to cni lo network: failed to find plugin "loopback" in path [/opt/cni/bin/ /opt/loopback/bin]"
May 20 10:36:08 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:08.350822489Z" level=error msg="Error deleting network: failed to find plugin "openshift-sdn" in path [/opt/cni/bin/ /opt/openshift-sdn/bin]"
May 20 10:36:12 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:12.674429655Z" level=error msg="Error adding network: OpenShift SDN network process is not (yet?) available"
May 20 10:36:12 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:12.674450141Z" level=error msg="Error while adding to cni network: OpenShift SDN network process is not (yet?) available"
May 20 10:36:12 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:12.678066342Z" level=error msg="Error deleting network: failed to send CNI request: Post http://dummy/: dial unix /var/run/openshift-sdn/cni-server.sock: connect: no such file or directory"
May 20 10:36:19 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:19.536904826Z" level=error msg="Error adding network: OpenShift SDN network process is not (yet?) available"
May 20 10:36:19 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:19.536930324Z" level=error msg="Error while adding to cni network: OpenShift SDN network process is not (yet?) available"
May 20 10:36:19 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:19.540556043Z" level=error msg="Error deleting network: failed to send CNI request: Post http://dummy/: dial unix /var/run/openshift-sdn/cni-server.sock: connect: no such file or directory"
^ after the last log entry (May 20 10:36:19 ) there appears to be no more issues talking to the SDN.
SDN startup logs:
2020/05/20 10:36:08 socat[3636] E connect(5, AF=1 "/var/run/openshift-sdn/cni-server.sock", 40): No such file or directory
User "sa" set.
Context "default-context" modified.
which: no openshift-sdn in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
I0520 10:36:18.909405 3563 start_network.go:193] Reading node configuration from /etc/origin/node/node-config.yaml
I0520 10:36:18.912000 3563 start_network.go:200] Starting node networking ip-X.ec2.internal (v3.11.0+d699176-406)
W0520 10:36:18.912136 3563 server.go:195] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
I0520 10:36:18.912165 3563 feature_gate.go:230] feature gates: &{map[]}
I0520 10:36:18.914229 3563 transport.go:160] Refreshing client certificate from store
I0520 10:36:18.914255 3563 certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".
I0520 10:36:18.929569 3563 node.go:151] Initializing SDN node of type "redhat/openshift-ovs-networkpolicy" with configured hostname "ip-X.ec2.internal" (IP ""), iptables sync period "30s"
I0520 10:36:18.934523 3563 node.go:281] Starting openshift-sdn network plugin
I0520 10:36:19.214899 3563 sdn_controller.go:138] [SDN setup] full SDN setup required (Link not found)
I0520 10:36:19.643931 3563 vnids.go:148] Associate netid 12328631 to namespace "X" with mcEnabled false
....
W0520 10:36:19.908659 3563 pod.go:218] No sandbox for pod datadog-agent/datadog-agent-hv9p4
W0520 10:36:19.908684 3563 pod.go:218] No sandbox for pod openshift-monitoring/node-exporter-ps4bh
W0520 10:36:19.908690 3563 pod.go:218] No sandbox for pod openshift-node/sync-dw5cj
W0520 10:36:19.908696 3563 pod.go:218] No sandbox for pod openshift-sdn/ovs-gxfbq
W0520 10:36:19.908704 3563 pod.go:218] No sandbox for pod openshift-sdn/sdn-rkk56
I0520 10:36:19.908709 3563 node.go:352] Starting openshift-sdn pod manager
E0520 10:36:19.908931 3563 cniserver.go:150] failed to remove old pod info socket: remove /var/run/openshift-sdn: device or resource busy
E0520 10:36:19.908967 3563 cniserver.go:153] failed to remove contents of socket directory: remove /var/run/openshift-sdn: device or resource busy
I0520 10:36:19.913154 3563 node.go:379] openshift-sdn network plugin registering startup
I0520 10:36:19.913277 3563 node.go:397] openshift-sdn network plugin ready
I0520 10:36:19.915712 3563 network.go:97] Using iptables Proxier.
I0520 10:36:19.916200 3563 networkpolicy.go:331] SyncVNIDRules: 0 unused VNIDs
I0520 10:36:19.918356 3563 network.go:129] Tearing down userspace rules.
I0520 10:36:19.930669 3563 proxier.go:216] Setting proxy IP to X and initializing iptables
I0520 10:36:19.949401 3563 proxy.go:86] Starting multitenant SDN proxy endpoint filter
I0520 10:36:19.955340 3563 config.go:202] Starting service config controller
I0520 10:36:19.955358 3563 controller_utils.go:1025] Waiting for caches to sync for service config controller
I0520 10:36:19.955371 3563 network.go:231] Started Kubernetes Proxy on 0.0.0.0
I0520 10:36:19.955387 3563 start_network.go:232] Waiting for the SDN proxy startup to complete...
I0520 10:36:19.955386 3563 config.go:102] Starting endpoints config controller
I0520 10:36:19.955404 3563 controller_utils.go:1025] Waiting for caches to sync for endpoints config controller
I0520 10:36:19.955478 3563 network.go:55] Starting DNS on 127.0.0.1:53
I0520 10:36:19.956467 3563 server.go:76] Monitoring dnsmasq to point cluster queries to 127.0.0.1
I0520 10:36:19.956562 3563 logs.go:49] skydns: ready for queries on cluster.local. for tcp://127.0.0.1:53 [rcache 0]
I0520 10:36:19.956603 3563 logs.go:49] skydns: ready for queries on cluster.local. for udp://127.0.0.1:53 [rcache 0]
If I destroy the pod and allow k8s to recreate it the new pod comes up just fine (in this case its a daemonset so its getting assigned to the same node)
FYI I got more information from looking at kubelet:
2269 kuberuntime_manager.go:385] No sandbox for pod "fluent-bit-962zf_openshift-logging(a2bd2a99-9a85-11ea-a7a8-0287f3ab0050)" can be found. Need to start a new one
2269 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_fluent-bit-962zf_openshift-logging_a2bd2a99-9a85-11ea-a7a8-0287f3ab0050_0(301ea30ea0ad94070c5d934f3d1799cc4225afa865ff498619057bc08ad09b6d): failed to find plugin "loopback" in path [/opt/cni/bin/ /opt/loopback/bin]
if I do a ls in /opt/cni/bin I find a loopback binary:
ls /opt/cni/bin/
host-local loopback openshift-sdn
not sure whats missing, maybe it does not exist when the node is first starting up
This repo is only used for openshift 4.2 and later; in earlier releases the openshift-sdn code is part of the origin repo. Although it sounds like your problem not with openshift-sdn anyway, it's some sort of general OKD/crio/something problem.
/close
@danwinship: Closing this issue.
In response to this:
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.