Mellanox/mstflint

mstconfig not working w/ ConnectX-3

kfandre opened this issue · 7 comments

This is the same issue as was previously submitted but closed here: #33

I am unable to enable SR-IOV on this card because the tool can't query or change the config.

I have the same card in a windows machine and I actually have the same problem using the latest WinMFT install from nvidia/mellanox site.

I also tried using the mellanox OFED drivers installed on a supported linux distro and I get the same problem.

Since the connectX-3 isn't supported anymore, this tool is basically the only way to modify the firmware configuration of this otherwise excellent card. It would be great if someone could figure this out.

I stepped through the execution with gdb and turned on debug printfs and it fails accessing a device register. The specific return code is 0x22. The code maps this to #define called ME_REG_ACCESS_CONF_CORRUPT. Corrupt configuration is a potential clue, so a re-flashed the latest firmware to see if it made a difference. I also tried the mstconfig program's reset function (e.g. mstconfig -d 04:00.0 r) to see if that would clear it up. Finally, I tried an older firmware in case the problem is firmware related. Sadly, nothing I tried seemed to work.

mstflint -d 04:00.0 q
Image type:            FS2
FW Version:            2.42.5000
FW Release Date:       5.9.2017
Product Version:       02.42.50.00
Rom Info:              type=PXE version=3.4.752
Device ID:             4099
Description:           Node             Port1            Port2            Sys image
GUIDs:                 0002c90300056aa8 0002c90300056aa9 0002c90300056aaa 0002c90300056aab
MACs:                                       0002c9a24ac0     0002c9a24ac1
VSD:
PSID:                  MT_1170110023

---
 **** I flipped on some debug in the source and ran with MFT_DEBUG=1 ****

./mstconfig -d 04:00.0 q

Device #1:
----------

Device type:    ConnectX3       
Device:         04:00.0         

Configurations:                              Next Boot
-I- Data Sent:
	======== tools_open_sriov ========
	total_vfs            : 0x0
	sriov_en             : 0x0
Sending Access Register:
Register ID: 0x9024
Register Size: 12 bytes
AccessRegister Class SMP Failed!
Mad Status: 0x00000000
Register Status: 0x00000022
-I- Data Received:
	======== tools_open_sriov ========
	total_vfs            : 0x0
	sriov_en             : 0x0
-E- Failed to query device current configuration

---

 mlxconfig -d /dev/mst/mt4099_pciconf0 set SRIOV_EN=1 NUM_OF_VFS=8

Device #1:
----------

Device type:    ConnectX3
Device:         /dev/mst/mt4099_pciconf0

Configurations:                              Next Boot       New
-E- Failed to query device current configuration


Running into the same issue. Tried configuring two different ConnectX-3 cards in two different machines, both give the same error message.

I was never able to get mstconfig working for this, but I was still able to modify the CX3 configs using a combination of flint to dump the config file and mlxburn to apply it after I made my changes.

This Reddit post that has a link to the .mlx firmware needed (instead of the .bin versions available directly from Mellanox's website).
https://www.reddit.com/r/homelab/comments/jloll6/comment/gbff04p/?utm_source=share&utm_medium=web2x&context=3

This STH post outlines how to modify the config.
https://forums.servethehome.com/index.php?threads/sr-iov-for-mellanox-connectx-2.12693/#post-121167

In order to run the mlxburn command I needed the mft-oem version of the mft package which contains the mic tool. You should be able to find the version for your machine somewhere here: https://linux.mellanox.com/public/repo/

Hope this helps.

Thanks for the tip! Unfortunately, I’m not interested in SR-IOV, I’m interested in Wake On LAN (WOL).

While I can run flint -d /dev/mst/mt4099_pciconf0 dc, I can’t find any setting related to WOL in the resulting .ini file.

I was never able to get mstconfig working for this, but I was still able to modify the CX3 configs using a combination of flint to dump the config file and mlxburn to apply it after I made my changes.

This Reddit post that has a link to the .mlx firmware needed (instead of the .bin versions available directly from Mellanox's website). https://www.reddit.com/r/homelab/comments/jloll6/comment/gbff04p/?utm_source=share&utm_medium=web2x&context=3

This STH post outlines how to modify the config. https://forums.servethehome.com/index.php?threads/sr-iov-for-mellanox-connectx-2.12693/#post-121167

In order to run the mlxburn command I needed the mft-oem version of the mft package which contains the mic tool. You should be able to find the version for your machine somewhere here: https://linux.mellanox.com/public/repo/

Hope this helps.

Ran across this too but it didn't work for me either. I do recall having a difficult time locating a working mic program. I gave up and bought an x510-da2 based card that just works without all the fuss.

I just found why this is happening. It seems to be an issues with the mft tools and the single port CX3 (as discussed here). The solution is to generate an image with a modified ini (to enable sriov) but the mlx fw file you can use with that is older (2.40.5030) than the latest available bin (2.42.5000). I did just that and it worked. I can see the vf's in proxmox now.

However, is it worth running an older fw to make sriov work or maybe just replace this card with a CX4 or even a dual port CX3 that don't exhibit this issue? With these cards, you can simply use mlxconfig to edit the fw settings directly without needing to generate a modified image and reflashing.

So does anyone have a link to the 2.42.5000 mlx fw?

Hi,

I'm facing the same issue with version 4.26.0. On 4.25.0 is working without issues. But the strange thing is that in PDF is still says on page 10 and 11:
Supported Interface Cards (NICs): Group I/4th Generation -> Adapter Cards: NVIDIA ConnectX-3, NVIDIA ConnectX-3 Pro.

Even with the driver is still says:

mstconfig query
-E- Unsupported device
-E- Unsupported device

With 4.25.0:

mstconfig query | grep Device
Device #1:
Device type:    ConnectX3
Device:         /sys/bus/pci/devices/0000:03:00.0/config
Device #2:
Device type:    ConnectX3
Device:         /sys/bus/pci/devices/0000:04:00.0/config

I'm using Debian Trixie:

Linux R02 6.5.0-4-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.5.10-1 (2023-11-03) x86_64 GNU/Linux

Same for me with a Connect X3 Pro. Version 4.26 reports -E- Unsupported device, but version 4.25 works.