openshift/sdn

OKD 3.11 - crio - failed to find plugin "loopback" in path [/opt/cni/bin/ /opt/loopback/bin]

tidbits03 opened this issue · 4 comments

I am currently running a large okd: v3.11.0+b750162-89 cluster running crio as the runtime. I am seeing an issue with pods getting stuck in a ContainerCreating state. Upon investigation it appears pods are getting created before the SDN is fully up and running, once its up however it fails to create the pod.

here is a example:

pod start time: Wed, 20 May 2020 06:35:34 -0400

events:

  Type     Reason                  Age                 From                                   Message
  ----     ------                  ----                ----                                   -------
  Warning  FailedCreatePodSandBox  55m                 kubelet, ip-X.ec2.internal  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_fluent-bit-962zf_openshift-logging_a2bd2a99-9a85-11ea-a7a8-0287f3ab0050_0(301ea30ea0ad94070c5d934f3d1799cc4225afa865ff498619057bc08ad09b6d): failed to find plugin "loopback" in path [/opt/cni/bin/ /opt/loopback/bin]
  Warning  FailedCreatePodSandBox  54m                 kubelet, ip-X.ec2.internal  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_fluent-bit-962zf_openshift-logging_a2bd2a99-9a85-11ea-a7a8-0287f3ab0050_0(99f7918f079c7e829d731926f01351419d931be67d00cac1de20726382c8810f): failed to find plugin "loopback" in path [/opt/cni/bin/ /opt/loopback/bin]
  Warning  FailedCreatePodSandBox  54m                 kubelet, ip-X.ec2.internal  Failed create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_fluent-bit-962zf_openshift-logging_a2bd2a99-9a85-11ea-a7a8-0287f3ab0050_0(9518362bebc1185be0c9df17e38dfd4931a64cd05aa948f7a75f2796673cc3e5): OpenShift SDN network process is not (yet?) available
  Warning  FailedCreatePodSandBox  4m (x220 over 54m)  kubelet, ip-X.ec2.internal  Failed create pod sandbox: rpc error: code = Unknown desc = pod sandbox with name "k8s_fluent-bit-962zf_openshift-logging_a2bd2a99-9a85-11ea-a7a8-0287f3ab0050_0" already exists

SDN start time: Wed, 20 May 2020 06:35:34 -0400

crio logs:

May 20 10:35:52 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:35:52.173892500Z" level=error msg="Error deleting network: failed to find plugin "openshift-sdn" in path [/opt/cni/bin/ /opt/openshift-sdn/bin]"
May 20 10:36:08 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:08.350781469Z" level=error msg="Error adding network: failed to find plugin "loopback" in path [/opt/cni/bin/ /opt/loopback/bin]"
May 20 10:36:08 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:08.350805073Z" level=error msg="Error while adding to cni lo network: failed to find plugin "loopback" in path [/opt/cni/bin/ /opt/loopback/bin]"
May 20 10:36:08 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:08.350822489Z" level=error msg="Error deleting network: failed to find plugin "openshift-sdn" in path [/opt/cni/bin/ /opt/openshift-sdn/bin]"
May 20 10:36:12 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:12.674429655Z" level=error msg="Error adding network: OpenShift SDN network process is not (yet?) available"
May 20 10:36:12 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:12.674450141Z" level=error msg="Error while adding to cni network: OpenShift SDN network process is not (yet?) available"
May 20 10:36:12 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:12.678066342Z" level=error msg="Error deleting network: failed to send CNI request: Post http://dummy/: dial unix /var/run/openshift-sdn/cni-server.sock: connect: no such file or directory"
May 20 10:36:19 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:19.536904826Z" level=error msg="Error adding network: OpenShift SDN network process is not (yet?) available"
May 20 10:36:19 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:19.536930324Z" level=error msg="Error while adding to cni network: OpenShift SDN network process is not (yet?) available"
May 20 10:36:19 ip-X.ec2.internal crio[1555]: time="2020-05-20 10:36:19.540556043Z" level=error msg="Error deleting network: failed to send CNI request: Post http://dummy/: dial unix /var/run/openshift-sdn/cni-server.sock: connect: no such file or directory"

^ after the last log entry (May 20 10:36:19 ) there appears to be no more issues talking to the SDN.

SDN startup logs:

2020/05/20 10:36:08 socat[3636] E connect(5, AF=1 "/var/run/openshift-sdn/cni-server.sock", 40): No such file or directory
User "sa" set.
Context "default-context" modified.
which: no openshift-sdn in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
I0520 10:36:18.909405    3563 start_network.go:193] Reading node configuration from /etc/origin/node/node-config.yaml
I0520 10:36:18.912000    3563 start_network.go:200] Starting node networking ip-X.ec2.internal (v3.11.0+d699176-406)
W0520 10:36:18.912136    3563 server.go:195] WARNING: all flags other than --config, --write-config-to, and --cleanup are deprecated. Please begin using a config file ASAP.
I0520 10:36:18.912165    3563 feature_gate.go:230] feature gates: &{map[]}
I0520 10:36:18.914229    3563 transport.go:160] Refreshing client certificate from store
I0520 10:36:18.914255    3563 certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".
I0520 10:36:18.929569    3563 node.go:151] Initializing SDN node of type "redhat/openshift-ovs-networkpolicy" with configured hostname "ip-X.ec2.internal" (IP ""), iptables sync period "30s"
I0520 10:36:18.934523    3563 node.go:281] Starting openshift-sdn network plugin
I0520 10:36:19.214899    3563 sdn_controller.go:138] [SDN setup] full SDN setup required (Link not found)
I0520 10:36:19.643931    3563 vnids.go:148] Associate netid 12328631 to namespace "X" with mcEnabled false
....
W0520 10:36:19.908659    3563 pod.go:218] No sandbox for pod datadog-agent/datadog-agent-hv9p4
W0520 10:36:19.908684    3563 pod.go:218] No sandbox for pod openshift-monitoring/node-exporter-ps4bh
W0520 10:36:19.908690    3563 pod.go:218] No sandbox for pod openshift-node/sync-dw5cj
W0520 10:36:19.908696    3563 pod.go:218] No sandbox for pod openshift-sdn/ovs-gxfbq
W0520 10:36:19.908704    3563 pod.go:218] No sandbox for pod openshift-sdn/sdn-rkk56
I0520 10:36:19.908709    3563 node.go:352] Starting openshift-sdn pod manager
E0520 10:36:19.908931    3563 cniserver.go:150] failed to remove old pod info socket: remove /var/run/openshift-sdn: device or resource busy
E0520 10:36:19.908967    3563 cniserver.go:153] failed to remove contents of socket directory: remove /var/run/openshift-sdn: device or resource busy
I0520 10:36:19.913154    3563 node.go:379] openshift-sdn network plugin registering startup
I0520 10:36:19.913277    3563 node.go:397] openshift-sdn network plugin ready
I0520 10:36:19.915712    3563 network.go:97] Using iptables Proxier.
I0520 10:36:19.916200    3563 networkpolicy.go:331] SyncVNIDRules: 0 unused VNIDs
I0520 10:36:19.918356    3563 network.go:129] Tearing down userspace rules.
I0520 10:36:19.930669    3563 proxier.go:216] Setting proxy IP to X and initializing iptables
I0520 10:36:19.949401    3563 proxy.go:86] Starting multitenant SDN proxy endpoint filter
I0520 10:36:19.955340    3563 config.go:202] Starting service config controller
I0520 10:36:19.955358    3563 controller_utils.go:1025] Waiting for caches to sync for service config controller
I0520 10:36:19.955371    3563 network.go:231] Started Kubernetes Proxy on 0.0.0.0
I0520 10:36:19.955387    3563 start_network.go:232] Waiting for the SDN proxy startup to complete...
I0520 10:36:19.955386    3563 config.go:102] Starting endpoints config controller
I0520 10:36:19.955404    3563 controller_utils.go:1025] Waiting for caches to sync for endpoints config controller
I0520 10:36:19.955478    3563 network.go:55] Starting DNS on 127.0.0.1:53
I0520 10:36:19.956467    3563 server.go:76] Monitoring dnsmasq to point cluster queries to 127.0.0.1
I0520 10:36:19.956562    3563 logs.go:49] skydns: ready for queries on cluster.local. for tcp://127.0.0.1:53 [rcache 0]
I0520 10:36:19.956603    3563 logs.go:49] skydns: ready for queries on cluster.local. for udp://127.0.0.1:53 [rcache 0]

If I destroy the pod and allow k8s to recreate it the new pod comes up just fine (in this case its a daemonset so its getting assigned to the same node)

FYI I got more information from looking at kubelet:

2269 kuberuntime_manager.go:385] No sandbox for pod "fluent-bit-962zf_openshift-logging(a2bd2a99-9a85-11ea-a7a8-0287f3ab0050)" can be found. Need to start a new one

2269 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_fluent-bit-962zf_openshift-logging_a2bd2a99-9a85-11ea-a7a8-0287f3ab0050_0(301ea30ea0ad94070c5d934f3d1799cc4225afa865ff498619057bc08ad09b6d): failed to find plugin "loopback" in path [/opt/cni/bin/ /opt/loopback/bin]

if I do a ls in /opt/cni/bin I find a loopback binary:

ls /opt/cni/bin/
host-local  loopback  openshift-sdn

not sure whats missing, maybe it does not exist when the node is first starting up

This repo is only used for openshift 4.2 and later; in earlier releases the openshift-sdn code is part of the origin repo. Although it sounds like your problem not with openshift-sdn anyway, it's some sort of general OKD/crio/something problem.

/close

@danwinship: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.