k8snetworkplumbingwg/sriov-cni

sriov-cni del doesn't work when there is altName on the interace

SchSeba opened this issue · 6 comments

I found this one on RHEL 9.2 but I think it effects other distros

net1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether c2:f5:9e:e2:05:a2 brd ff:ff:ff:ff:ff:ff
altname enp216s0f1v9
altname ens3f1v9

and when we remove a pod and the sriov-cni tries to move back the nic and rename it we get this error

0s Warning FailedKillPod pod/client error killing pod: failed to "KillPodSandbox" for "56fd1e57-c9f3-4aae-8ab7-3f6564e0c675" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for pod sandbox k8s_client_seba_56fd1e57-c9f3-4aae-8ab7-3f6564e0c675_0(8b0f6b4a6d8d639e72d0d27b41719389f002e26e6e723c019c716ff43abdb5e1): error removing pod seba_client from CNI network \"multus-cni-network\": plugin type=\"multus\" name=\"multus-cni-network\" failed (delete): DelegateDel: error invoking DelegateDel - \"sriov\": error in getting result from DelNetwork: failed to move interface ens3f1v6 to init netns: file exists"

and the nic is moved to the host with the wrong name

69: dev69: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether c2:f5:9e:e2:05:a2 brd ff:ff:ff:ff:ff:ff
altname enp216s0f1v9
altname ens3f1v9

because of the altname a new pod can start but that is not the right way.

first step is to add the altName to netlink I already open a PR for that

vishvananda/netlink#862

I found same issue while pod being killed, and stuck on dead loop.

First StopPodSandbox error:
msg="StopPodSandbox for "48161e41da530ba0d7933911fe2c7f827265521591e03e2084743a3fa032526b" failed" error="failed to destroy network for sandbox "48161e41da530ba0d7933911fe2c7f827265521591e03e2084743a3fa032526b": plugin type="multus" name="multus-cni-network" failed (delete): delegateDel: error invoking ConflistDel - "sriov-o1c-host0": conflistDel: error in getting result from DelNetworkList: failed to move interface enp1s17f5 to init netns: file exists"

rest of StopPodSandbox error (dead loop)
msg="StopPodSandbox for "48161e41da530ba0d7933911fe2c7f827265521591e03e2084743a3fa032526b" failed" error="failed to destroy network for sandbox "48161e41da530ba0d7933911fe2c7f827265521591e03e2084743a3fa032526b": plugin type="multus" name="multus-cni-network" failed (delete): delegateDel: error invoking ConflistDel - "sriov-o1c-host0": conflistDel: error in getting result from DelNetworkList: Failed to get link neto1c: Link not found"

Since first StopPodSandbox failed, next attempts of StopPodSandbox, again calls sriov-cni plugin to release sriov interface, and now it don't find the interface (neto1c) (as already been renamed to enp1s17f5 from first StopPodSandbox) and throws error: Failed to get link neto1c: Link not found. Successive StopPodSandbox keep throwing this same error.

Stuck pod shows:
12: enp1s17f5: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether 0a:d7:02:03:00:06 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
1652 22 0 0 0 22
TX: bytes packets errors dropped carrier collsns
892 10 0 0 0 0
altname enp1s17f5

neto1c is not on the pod and enp1s17f5 is not on host namespace.

Duplicate "altname" and "interface name" lead to the "file exists" error while trying to move the pod interface to host namespace.

Replication of problem, with minimal steps:
1)
Option A: create tuntap interface
ip tuntap add mode tap enp138s1f3
Option B: There is VF interface with name enp138s1f3
2) create network namespace
ip netns add myns
3) mock SetupVF
ip link set enp138s1f3 name temp12 #LinkSetName, rename to temp IF name
ip link property add dev temp12 altname enp138s1f3 #add altname, setupVF is not doing this. who is setting ?
ip link set temp12 netns myns # move interface to netns
ip netns exec myns ip link set temp12 name neto1c #Set Pod IF name
ip netns exec myns ip link
12: neto1c: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether d2:9d:7b:9e:9d:58 brd ff:ff:ff:ff:ff:ff
altname enp138s1f3
4) mock ReleaseVF
ip netns exec myns ip link set neto1c name enp138s1f3 #LinkSetName, rename device
ip netns exec myns ip link set enp138s1f3 netns 1 #LinkSetNsFd, move device to init netns
RTNETLINK answers: File exists <----------- This is exact issue.
5) After error,
ip netns exec myns ip link
12: enp138s1f3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether d2:9d:7b:9e:9d:58 brd ff:ff:ff:ff:ff:ff altname enp138s1f3
<------------error is because of duplicate interface name and altname.

ip netns exec myns ip link property del dev enp138s1f3 altname enp138s1f3
Option A: no error
Option B: RTNETLINK answers: Invalid argument

Option B:
ip netns del myns
ip l # shows: dev57 interface name with altname enp138s1f3
ip link property del dev dev57 altname enp138s1f3
ip l # shows: dev57 interface without altname
ip link set dev57 name enp138s1f3

This proves we are getting this problem because "interface name" get renamed such that it become duplicate to "altname".
Should ReleaseVF remove altname before trying to rename the interface back to original name ?

How do we get altname at first place ?
ethtool -i ens787f0
driver: ixgbe
version: 5.10.0-6-amd64
firmware-version: 0x80001375, 1.2877.0
expansion-rom-version:
bus-info: 0000:54:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

echo 1 > /sys/class/net/ens787f0/device/sriov_numvfs
ip l # shows vf 0 under ens787f0 and also shows new enp84s16 interface (without altname)
ip link set enp84s16 name temp12
ip l # shows temp12 with altname enp84s16

The above "ip link set enp84s16 name temp12" is just supposed to rename interface but it is not supposed to add altname.
Point is if interface gets altname somehow, sriov-cni should be able to handle the cleanup anyway.

More comment on how do we get altname at first place ?

The feature AlternativeNamesPolicy allows to create alternate names for existing interfaces. It is space-separated list of policies by which the interface's alternative names should be set. Each of the policies may fail, and all successful policies are used. The available policies are "database", "onboard", "slot", "path", and "mac". If the kernel does not support the alternative names, then this setting will be ignored.

"database": The name is set based on entries in the udev's Hardware Database with the key "ID_NET_NAME_FROM_DATABASE".
"onboard": The name is set based on information given by the firmware for on-board devices, as exported by the udev property "ID_NET_NAME_ONBOARD".
"slot": The name is set based on information given by the firmware for hot-plug devices, as exported by the udev property "ID_NET_NAME_SLOT".
"path": The name is set based on the device's physical location, as exported by the udev property "ID_NET_NAME_PATH".
"mac": The name is set based on the device's persistent MAC address, as exported by the udev property "ID_NET_NAME_MAC".

uname -a
5.10.0-6-amd64

systemd --version
systemd 247

udevadm info /sys/class/net/enp59s17f7
ID_NET_NAME_PATH=enp59s17f7
ID_MODEL_FROM_DATABASE=Ethernet Virtual Function 700 Series
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link

/usr/lib/systemd/network/99-default.link
[Link]
NamePolicy=keep kernel database onboard slot path
AlternativeNamesPolicy=database onboard slot path
MACAddressPolicy=persistent

** At this point, "ip link set enp59s17f7 name temp12" sets "altname" too. This is because of match of "path" policy of AlternativeNamesPolicy i.e. existence of udev property ID_NET_NAME_PATH.

In altname presence, sriov-cni should handle the cleanup properly.

Hi @subeditara thanks for the information
can you please add the platform you are using?
and also the version of the systemd you have installed on your system

kernel: 5.10.0-6-amd64 Debian (Bullseye)
systemd: 247.3-7

Hi @subeditara thanks for the comment!

so looks like the problem was fixed in systemd version 254 I will try to continue and push the PR here that should just remove the alt-name if exist