Cannot config numVfs for mellanox NICs on RHCOS node
pliurh opened this issue · 15 comments
https://bugzilla.redhat.com/show_bug.cgi?id=1733897
Description of problem:
When created Sriov Network Node Policy with vondor is 15b3, the VF cannot be initialized.
Version-Release number of selected component (if applicable):
How reproducible:
always
Steps to Reproduce:
- setup the baremetal env
- installed the sriov operator
- Create the Sriov Network Node Policy on mellanox PF
- check the nodestatus
oc get sriovnetworknodestates.sriovnetwork.openshift.io -o yaml - check the sriov daemon logs
Actual results:
4. no 'Vfs' is created
5. oc logs sriov-network-config-daemon-pc8gl
daemon logs:
I0729 03:56:58.218766 15417 mellanox_plugin.go:59] mellanox-plugin OnNodeStateAdd()
I0729 03:56:58.218800 15417 mellanox_plugin.go:66] mellanox-Plugin OnNodeStateChange()
I0729 03:56:58.218813 15417 mellanox_plugin.go:267] mellanox-plugin isMlnxNicAndInNode(): device 0000:5e:00.0
I0729 03:56:58.218823 15417 mellanox_plugin.go:181] mellanox-plugin getMlnxNicFwData(): for device 0000:5e:00.0
I0729 03:56:58.218828 15417 mellanox_plugin.go:252] mellanox-plugin isSinglePortNic(): device 0000:5e:00.0
I0729 03:56:58.218831 15417 mellanox_plugin.go:157] mellanox-plugin mstconfigReadData(): try to read [LINK_TYPE] for device 0000:5e:00.0
I0729 03:56:58.218854 15417 mellanox_plugin.go:169] mellanox-plugin runCommand(): mstconfig [-d 0000:5e:00.0 q LINK_TYPE]
I0729 03:56:58.225057 15417 writer.go:107] setNodeStateStatus(): syncStatus: InProgress, lastSyncError:
E0729 03:56:58.235747 15417 mellanox_plugin.go:163] mellanox-plugin mstconfigReadData(): failed : exit status 3 : -E- Failed to open the device
I0729 03:56:58.235796 15417 mellanox_plugin.go:157] mellanox-plugin mstconfigReadData(): try to read [LINK_TYPE_P2] for device 0000:5e:00.0
I0729 03:56:58.235819 15417 mellanox_plugin.go:169] mellanox-plugin runCommand(): mstconfig [-d 0000:5e:00.0 q LINK_TYPE_P2]
E0729 03:56:58.244693 15417 mellanox_plugin.go:163] mellanox-plugin mstconfigReadData(): failed : exit status 3 : -E- Failed to open the device
E0729 03:56:58.244779 15417 daemon.go:147] nodeStateAddHandler(): plugin mellanox_plugin error: exit status 3
I0729 03:56:58.244822 15417 daemon.go:240] nodeStateChangeHandler(): Interface not changed
W0729 03:56:58.244845 15417 daemon.go:115] Got an error: exit status 3
E0729 03:56:58.244916 15417 start.go:105] failed to run daemon: exit status 3
Expected results:
VF for mellanox can be worked
SriovNetworkNodeState CR of test node.
https://pastebin.com/hKDNNFjD
sh-4.4# lspci -d 15b3:
5e:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
5e:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
60:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
60:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
Run mstconfig manually from the config daemon
[root@dell-per740-14 /]# mstconfig q -d 0000:5e:00.0
-E- Failed to open the device
-E- Failed to open the device
What is mstconfig version and what is the card FW?
@moshe010 How to check the FW version without mstconfig? It's a RHCOS node, there is not mstconfig available in host operating system.
# mstconfig -v
mstconfig, mstflint 4.9.0, built on May 7 2018, 05:37:24. Git SHA Hash: 59e7f1c
So this is CoreOS node or RHEL 8.0? How can we reproduce this by our self?
@moshe010 It's a CoreOS node.
I did some investigation on that node. Didn't find the root cause, however I found using PCI configuration cycles instead of the PCI address works. It could be a workaround.
[root@dell-per740-14 /]# mstconfig -d 5e:00.0 query
-E- Failed to open the device
[root@dell-per740-14 /]# mstconfig -d /proc/bus/pci/5e/00.0 q
Device #1:
----------
Device type: ConnectX5
Name: N/A
Description: N/A
Device: /proc/bus/pci/5e/00.0
....
According this https://github.com/Mellanox/mstflint/blob/ddb4350e32c37dcbe8fe0d295eac05f2a23762db/README#L134
@pliurh can you schedule a debug session ?
I would like first to see if mstflint works natively on CoreOS as it relies on kernel to access the device.
(this is something that is not officially supported by mstflint package)
when providing mstconfig a PCI device it accesses it through sysfs (/sys/bus/pci/devices/<d:b:d:f>/config)
Also, how can we reproduce this issue in-house ?
do we just install CoreOS from : https://coreos.com/os/docs/latest/booting-with-iso.html ?
@adrianchiris There is not mstflint package available on CoreOS, and it is not allowed to install any rpm package either. The way I run mstconfig
is from the pod of the sriov config daemon which is privileged and hostNetwork.
That iso file is just a booting image, CoreOS requires ignition files to boot up. You can find more information at [1].
Currently, all the deployment tools are for downstream images, which are not available without an internal pull-secret file. We're still trying to find a way for partners to install the latest OCP 4.2 build in-house.
The sysfs was mounted as ro
in config daemon container, which I believe stop mstconfig from working.
There is a related upstream issue containerd/containerd#3221
We can try explicitly mounting /sys to the contiainer as a workaround per our Slack discussion
in that case is shouldn't work also for other vendor as well right?
you need to echo /sys/class/net... /sriov_numvfs
@moshe010 sriov_numvfs is set through the container /host mount point
that is: /host/sys/bus/..../sriov_numvfs
in contrast to mstflint which relies on rw permissions to sysfs
Here is the bug of CRI-O cri-o/cri-o#2625
Currently, RHCOS contains crio 1.4.x, which hasn't got this bug fix in downstream build yet.
we tested it with the latest cri-o Release 1.14.10
and the issues is resolved (0b6c0ab93e02949bad98ac53a049baed36ab66ef).
I think it better to change the echo num of vf to be from /sys and not /host/sys