Mellanox/mstflint

mstconfig cannot access devices with a large PCI domain

acgoldma opened this issue · 3 comments

# lspci | grep Mel
10000:01:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
# mstconfig q
-E- Failed to open the device
# mstconfig -d 10000:01:00.0 q
-E- Failed to open the device

Quick check of the code shows that the domain is only stored in a 16-bit value, which it can be larger than. I would suggest expand the fields to 32 bit at least.

Isn't 10000 fit in 16 bits?
Making the change you suggested in #926 might hide the actual problem. We'll look into it further internally.

In the meantime, could you please let us know the Linux distribution and version of the lspci utility?

The output for the domain is in HEX so that is 0x10000 which is greater than 16-bits.

This system is stock RHEL 8.8.

# uname -r
4.18.0-477.10.1.el8_8.x86_64
# lspci --version
lspci version 3.7.0
# dmidecode -s system-product-name
S2600WFT
# lscpu | grep "^Model name"
Model name:          Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz

One thing to note was that I was able to reset BIOS settings. This moved the device back to 0000 domain. From comparing the before and after dump of the BIOS, the possible BIOS setting might be:

[BIOS::Advanced::PCI Configuration::Volume Management Device]
Riser1, Slot1 Volume Management Device(CPU1, IOU1)=Enabled ;Options: Disabled=00: Enabled=01
...

While no longer blocking use for us, the issue remains as this is a possible config.

Also, Installing MOFED had similar issues with the a few more tools like FW update.

pciutils/pciutils@ab61451

May, 2016 :-)

We'll fix it in MFT codebase then propagate here. Thank you!