Pod hangs on terminating state due to missing slave interfaces
mlguerrero12 opened this issue · 0 comments
Scenario
Bond with two (in-container) interfaces.
k8s.v1.cni.cncf.io/networks: openshift-sriov-network-operator/test-sriov-for-bond-network,openshift-sriov-network-operator/test-sriov-for-bond-network,bond-testing/bond@bond0
"{"cniVersion":"0.4.0","name":"bond","plugins":[{"ipam": {"type":
"host-local", "subnet": "1.1.1.0/24"}, \n\t "type": "bond",\n\t\t"ifname":
"bond0",\n\t\t"mode": "active-backup",\n\t\t"failOverMac": 1,\n\t\t"linksInContainer":
true,\n\t\t"miimon": "100",\n\t\t"mtu": 1300,\n\t\t"links": [ {"name":
"net1"}, {"name": "net2"} ]}]}"
Issue
When the pod was deleted, the DEL command was called on the bond cni. This failed at detaching the slave links from the bond. For some reason, it couldn't bring the slave device back up (device or resource busy error).
The following DEL commands of the sriov cni deleted the slave links and when the DEL command on the bond cni was retried, it kept on failing because it couldn't find the slave links of the bond.
Pod deletion could not be completed (hanged on terminating state) affecting other operations.
"Error syncing pod, skipping" err="failed to "KillPodSandbox" for "d2b3c5db-ba5f-44a8-a7b9-0a8c087fd3dd" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_testpod-kgwvl_bond-testing_d2b3c5db-ba5f-44a8-a7b9-0a8c087fd3dd_0(24a237e7b3529293c682030db373fb6d902ffa861f60862beb60bcdaf93fa89f): error removing pod bond-testing_testpod-kgwvl from CNI network \"multus-cni-network\": plugin type=\"multus\" name=\"multus-cni-network\" failed (delete): delegateDel: error invoking ConflistDel - \"bond\": conflistDel: error in getting result from DelNetworkList: Failed to retrieve link objects from configuration file (&{NetConf:{CNIVersion:0.4.0 Name:bond Type:bond Capabilities:map[] IPAM:{Type:host-local} DNS:{Nameservers:[] Domain: Search:[] Options:[]} RawPrevResult:map[cniVersion:0.4.0 dns:map[] interfaces:[map[mac:c2:fb:59:b7:71:d3 name:bond0 sandbox:/var/run/netns/4fe7ca59-b4e2-4025-a199-894307cbe8b3]] ips:[map[address:1.1.1.6/24 gateway:1.1.1.1 interface:0 version:]]] PrevResult:} Mode:active-backup LinksContNs:true FailOverMac:1 Miimon:100 Links:[map[name:net1] map[name:net2]] MTU:1300}), error: Failed to confirm that link (net1) exists, error: Failed to lookup link name net1, error: Link not found"" pod="bond-testing/testpod-kgwvl" podUID=d2b3c5db-ba5f-44a8-a7b9-0a8c087fd3dd