oracle/linux-uek

SR-IOV broken for Connect-X4 in ethernet mode since UEK7.2

sdjerdj opened this issue · 12 comments

Description:
Since the UEK7u2, SR-IOV is broken for Mellanox ConnectX4, if one or both ports are configured for ethernet mode.
The same works fine if the ports are configure for IB mode.
Additionally, everything is fine on UEK7u1 regardless of the configuration.

Diagnostic info:
Port 0 is in IB mode
Port 1 is in ETH mode

Output of lspci :

lspci |grep Mellanox

0b:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0b:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0b:00.2 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0b:00.3 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0b:00.4 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0b:00.5 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0b:00.6 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0b:00.7 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]

Output of from dmesg:

dmesg |grep mlx

[ 1.349024] mlx5_core 0000:0b:00.0: firmware version: 12.28.2006
[ 1.349050] mlx5_core 0000:0b:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 4.071873] mlx5_core 0000:0b:00.0: Port module event: module 0, Cable plugged
[ 4.265254] mlx5_core 0000:0b:00.1: firmware version: 12.28.2006
[ 4.265304] mlx5_core 0000:0b:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 4.683438] mlx5_core 0000:0b:00.1: E-Switch: Total vports 10, per vport: max uc(1024) max mc(16384)
[ 4.687324] mlx5_core 0000:0b:00.1: Port module event: module 1, Cable plugged
[ 4.932149] mlx5_core 0000:0b:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295
[ 4.941115] mlx5_core 0000:0b:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0 basic)
[ 4.943224] mlx5_core 0000:0b:00.1 enp11s0f1np1: renamed from eth0
[ 5.655608] mlx5_core 0000:0b:00.1: Successfully registered panic handler for port 1
[ 5.773746] mlx5_core 0000:0b:00.1: mlx5_cmd_out_err:803:(pid 814): QUERY_HCA_CAP(0x100) op_mod(0x40) failed, status bad parameter(0x3), syndrome (0x5add95), err(-22)
[ 5.787044] mlx5_core 0000:0b:00.1: mlx5_device_enable_sriov:82:(pid 814): failed to enable eswitch SRIOV (-22)
[ 5.787047] mlx5_core 0000:0b:00.1: mlx5_sriov_enable:168:(pid 814): mlx5_device_enable_sriov failed : -22
[ 5.787222] mlx5_core 0000:0b:00.1 mlxen0: renamed from enp11s0f1np1
[ 6.213460] mlx5_core 0000:0b:00.0 mlxib0: renamed from ib0
[ 7.054459] mlx5_core 0000:0b:00.1 mlxen0: Link up
[ 7.057046] 8021q: adding VLAN 0 to HW filter on device mlxen0
[ 8.055388] IPv6: ADDRCONF(NETDEV_CHANGE): mlxen0: link becomes ready
[ 8.284101] IPv6: ADDRCONF(NETDEV_CHANGE): mlxib0: link becomes ready
[ 8.557689] IPv6: ADDRCONF(NETDEV_CHANGE): mlxib0: link becomes ready
[ 8.580025] br1: port 1(mlxen0) entered blocking state
[ 8.580028] br1: port 1(mlxen0) entered disabled state
[ 8.580067] device mlxen0 entered promiscuous mode
[ 8.580957] br1: port 1(mlxen0) entered blocking state
[ 8.580960] br1: port 1(mlxen0) entered listening state
[ 8.595823] mlx5_core 0000:0b:00.1: mlx5e_fs_set_rx_mode_work:843:(pid 156): S-tagged traffic will be dropped while C-tag vlan stripping is enabled
[ 10.635172] br1: port 1(mlxen0) entered learning state
[ 25.931048] br1: port 1(mlxen0) entered forwarding state

Manually trying to add VFs to the ETH port results with the following error:

echo 0 > /sys/class/net/mlxen0/device/sriov_numvfs
echo 7 > /sys/class/net/mlxen0/device/sriov_numvfs
-bash: echo: write error: Invalid argument

The same works just fine for the IB port:

echo 0 > /sys/class/net/mlxib0/device/sriov_numvfs
echo 7 > /sys/class/net/mlxib0/device/sriov_numvfs

I have confirmed that the same setup works properly with the latest RHCK kernel that comes with OL9.3 (5.14.0-362.8.1.el9_3.x86_64)

I have re-tested this with 5.15.0-202.135.2.el9uek.x86_64, the issue is still present
lspci shows the IB port VF's present but not for the ETH port:

lspci

0e:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0e:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0e:00.2 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.3 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.4 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.5 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.6 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.7 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.2 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.3 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.4 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.5 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.6 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.7 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]

dmesg shows the following:

dmesg |grep mlx

[ 1.364318] mlx5_core 0000:0e:00.0: firmware version: 12.28.2006
[ 1.364344] mlx5_core 0000:0e:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 4.090382] mlx5_core 0000:0e:00.0: Port module event: module 0, Cable plugged
[ 4.284359] mlx5_core 0000:0e:00.1: firmware version: 12.28.2006
[ 4.284398] mlx5_core 0000:0e:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 4.704293] mlx5_core 0000:0e:00.1: E-Switch: Total vports 17, per vport: max uc(1024) max mc(16384)
[ 4.708315] mlx5_core 0000:0e:00.1: Port module event: module 1, Cable plugged
[ 4.958517] mlx5_core 0000:0e:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295
[ 4.967739] mlx5_core 0000:0e:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0 basic)
[ 4.969721] mlx5_core 0000:0e:00.1 enp14s0f1np1: renamed from eth0
[ 5.677001] mlx5_core 0000:0e:00.1: Successfully registered panic handler for port 1
[ 5.825806] mlx5_core 0000:0e:00.1: mlx5_cmd_out_err:803:(pid 805): QUERY_HCA_CAP(0x100) op_mod(0x40) failed, status bad parameter(0x3), syndrome (0x5add95), err(-22)
[ 5.844531] mlx5_core 0000:0e:00.1: mlx5_device_enable_sriov:82:(pid 805): failed to enable eswitch SRIOV (-22)
[ 5.844535] mlx5_core 0000:0e:00.1: mlx5_sriov_enable:168:(pid 805): mlx5_device_enable_sriov failed : -22
[ 5.844723] mlx5_core 0000:0e:00.1 mlxen0: renamed from enp14s0f1np1
[ 6.185318] mlx5_core 0000:0e:00.0 mlxib0: renamed from ib0
[ 7.042738] mlx5_core 0000:0e:00.1 mlxen0: Link up
[ 7.045067] 8021q: adding VLAN 0 to HW filter on device mlxen0
[ 7.076354] IPv6: ADDRCONF(NETDEV_CHANGE): mlxen0: link becomes ready
[ 8.282229] IPv6: ADDRCONF(NETDEV_CHANGE): mlxib0: link becomes ready
[ 8.567892] br1: port 1(mlxen0) entered blocking state
[ 8.567894] br1: port 1(mlxen0) entered disabled state
[ 8.567923] device mlxen0 entered promiscuous mode
[ 8.568824] br1: port 1(mlxen0) entered blocking state
[ 8.568826] br1: port 1(mlxen0) entered listening state
[ 8.584119] mlx5_core 0000:0e:00.1: mlx5e_fs_set_rx_mode_work:843:(pid 137): S-tagged traffic will be dropped while C-tag vlan stripping is enabled
[ 10.572198] br1: port 1(mlxen0) entered learning state
[ 25.931080] br1: port 1(mlxen0) entered forwarding state

This issue seemed to be caused by the combination of an older FW (doesn't support querying hca_cap_2 bit) and a newer upstream mlx5 driver. Hence the failure of FW CMD: QUERY_HCA_CAP(0x100) was seen in the log.

If you can update the firmware that would be best. Alternatively wait until we have a kernel with
6496357 ("net/mlx5: Query hca_cap_2 only when supported")

Thx!

Thank you for the update!
No worries - Until the UEK kernel gets patches, I can use the RHCK kernel, which works as expected.
Regarding the firmware, the card has the latest one available for this card.

Here is a bit of good news:
I managed to apply the above patch on the top of 5.15.0-203.146.3 UEK kernel and I'm happy to report that the patch indeed resolves the issue:

[root@ol9 ~]# uname -r
5.15.0-203.146.888.el9uek.x86_64 <<== Patched test kernel
[root@ol9 ~]#

[root@ol9 ~]# dmesg |grep -e mlx -e eswitch
[ 1.352332] mlx5_core 0000:0e:00.0: firmware version: 12.28.2006
[ 1.352359] mlx5_core 0000:0e:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 4.075247] mlx5_core 0000:0e:00.0: Port module event: module 0, Cable plugged
[ 4.268187] mlx5_core 0000:0e:00.1: firmware version: 12.28.2006
[ 4.268227] mlx5_core 0000:0e:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 4.689720] mlx5_core 0000:0e:00.1: E-Switch: Total vports 17, per vport: max uc(1024) max mc(16384)
[ 4.693997] mlx5_core 0000:0e:00.1: Port module event: module 1, Cable plugged
[ 4.934593] mlx5_core 0000:0e:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295
[ 4.943461] mlx5_core 0000:0e:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0 basic)
[ 4.945410] mlx5_core 0000:0e:00.1 enp14s0f1np1: renamed from eth0
[ 5.647576] mlx5_core 0000:0e:00.1: Successfully registered panic handler for port 1
[ 6.032963] mlx5_core 0000:0e:00.1: E-Switch: Enable: mode(LEGACY), nvfs(14), active vports(15)
[ 6.091724] mlx5_core 0000:0e:00.0 mlxib0: renamed from ib0
[ 6.172684] mlx5_core 0000:0e:00.1 mlxen0: renamed from enp14s0f1np1
[ 7.012632] mlx5_core 0000:0e:00.1 mlxen0: Link up
[ 7.014728] 8021q: adding VLAN 0 to HW filter on device mlxen0
[ 7.048631] IPv6: ADDRCONF(NETDEV_CHANGE): mlxen0: link becomes ready
[ 8.234249] IPv6: ADDRCONF(NETDEV_CHANGE): mlxib0: link becomes ready
[ 8.496572] IPv6: ADDRCONF(NETDEV_CHANGE): mlxib0: link becomes ready
[ 8.517091] br1: port 1(mlxen0) entered blocking state
[ 8.517094] br1: port 1(mlxen0) entered disabled state
[ 8.517130] device mlxen0 entered promiscuous mode
[ 8.518280] br1: port 1(mlxen0) entered blocking state
[ 8.518281] br1: port 1(mlxen0) entered listening state
[ 8.535401] mlx5_core 0000:0e:00.1: mlx5e_fs_set_rx_mode_work:843:(pid 146): S-tagged traffic will be dropped while C-tag vlan stripping is enabled
[ 10.572228] br1: port 1(mlxen0) entered learning state
[ 25.932102] br1: port 1(mlxen0) entered forwarding state
[root@ol9 ~]#

[root@ol9 ~]# lspci |grep Mellanox
0e:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0e:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0e:00.2 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.3 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.4 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.5 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.6 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.7 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.2 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.3 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.4 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.5 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.6 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.7 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:02.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:02.2 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:02.3 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:02.4 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:02.5 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:02.6 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:02.7 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:03.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:03.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:03.2 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:03.3 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:03.4 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:03.5 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:03.6 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
[root@ol9 ~]#

Hopefully this info will make the inclusion of the patch above somewhat easier in the upcoming versions of the UEK kernel

Perfect. Will close this ticket once a new UEK kernel comes out with the backport.

Just curious if there is any progress on this issue ? The upstream kernel has this patch since July of 2023.

We'll get this into the next possible UEK7 errata release. Sorry for the delay.

This is currently scheduled for the May monthly errata release.

Hello,

Just a quick update:
It looks like the 5.15.0-206.153.7.el9uek.x86_64 kernel has resolved the issue:

$ lspci |grep Mellanox
0e:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0e:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]
0e:00.2 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.3 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.4 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.5 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.6 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:00.7 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.1 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.2 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.3 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.4 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.5 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.6 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:01.7 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:02.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:02.2 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:02.3 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:02.4 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:02.5 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:02.6 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:02.7 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:03.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:03.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:03.2 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:03.3 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:03.4 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:03.5 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
0e:03.6 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
$

$ dmesg |grep mlx5_core
[ 1.568472] mlx5_core 0000:0e:00.0: firmware version: 12.28.2006
[ 1.568500] mlx5_core 0000:0e:00.0: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 4.297432] mlx5_core 0000:0e:00.0: Port module event: module 0, Cable plugged
[ 4.491212] mlx5_core 0000:0e:00.1: firmware version: 12.28.2006
[ 4.491257] mlx5_core 0000:0e:00.1: 63.008 Gb/s available PCIe bandwidth (8.0 GT/s PCIe x8 link)
[ 4.912725] mlx5_core 0000:0e:00.1: E-Switch: Total vports 17, per vport: max uc(1024) max mc(16384)
[ 4.916861] mlx5_core 0000:0e:00.1: Port module event: module 1, Cable plugged
[ 5.160445] mlx5_core 0000:0e:00.1: Supported tc offload range - chains: 4294967294, prios: 4294967295
[ 5.169426] mlx5_core 0000:0e:00.1: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0 basic)
[ 5.171692] mlx5_core 0000:0e:00.1 enp14s0f1np1: renamed from eth0
[ 6.024422] mlx5_core 0000:0e:00.1: Successfully registered panic handler for port 1
[ 6.406512] mlx5_core 0000:0e:00.1: E-Switch: Enable: mode(LEGACY), nvfs(14), active vports(15)
$

Hi, it was actually fixed back in https://github.com/oracle/linux-uek/commits/v5.15.0-206.149.3, but I had been waiting to update this issue until we published the RPMs. They also appear to be available now. Thank you for your patience.