Mellanox/mstflint

mstflint: Operation not supported. MFE_CR_ERROR

davrot opened this issue · 2 comments

Hi,

I would like to update my Mellanox MHES14-XTC cards (MT_03F0110001) from fw 1.0.800 to 1.2.000. And I have a suitable firmware from https://network.nvidia.com/support/firmware/ih3lx/ . :-)

However, the mstflint doesn't give me much love:

root@gate2:~# lspci | grep InfiniBand:
07:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev a0)

root@gate2:/# mstflint -d 07:00.0 query
-E- Cannot open Device: 07:00.0. Operation not supported. MFE_CR_ERROR

root@gate2:~# mstflint -d /proc/bus/pci/07/00.0 query
-E- Cannot open Device: /proc/bus/pci/07/00.0. Operation not supported. MFE_CR_ERROR

Is there maybe parameter I missed? Thanks!

best wishes
David


root@gate2:/sys/class/infiniband# uname -a
Linux gate2 6.7.7-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Mar 1 16:53:59 UTC 2024 x86_64 GNU/Linux

root@gate2:/sys/class/infiniband# mstflint -v
mstflint, mstflint 4.24.0, built on Jul 20 2023, 00:00:00. Git SHA Hash: N/A

root@gate2:# dnf -y install mstflint rdma-core opensm
root@gate2:
# systemctl enable opensm.service
root@gate2:~# systemctl start opensm.service

cat /sys/class/infiniband/ibp7s0/board_id
MT_03F0110001

root@gate2:~# cat /sys/class/infiniband/ibp7s0/fw_ver
1.0.800

root@gate2:~# cat /sys/class/infiniband/ibp7s0/hw_rev
a0

root@gate2:~# cat /sys/class/infiniband/ibp7s0/hca_type
MT25204

root@gate2:~# cat /sys/class/infiniband/ibp7s0/node_desc
gate2 ibp7s0

root@gate2:~# cat /sys/class/infiniband/ibp7s0/node_guid
0002:c902:0024:c098

root@gate2:~# cat /sys/class/infiniband/ibp7s0/node_type
1: CA

root@gate2:~# cat /sys/class/infiniband/ibp7s0/uevent
NAME=mthca0

root@gate2:/# cat /sys/class/infiniband/ibp7s0/ports/1/rate
10 Gb/sec (4X SDR)

root@gate2:/# cat /sys/class/infiniband/ibp7s0/ports/1/state
4: ACTIVE

root@gate2:/# cat /sys/class/infiniband/ibp7s0/ports/1/phys_state
5: LinkUp

root@gate2:/# ethtool -i ibp7s0
driver: ib_ipoib
version: 6.7.7-200.fc39.x86_64
firmware-version: 1.0.800
expansion-rom-version:
bus-info: 0000:07:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

dmesg:
[ 5.365221] ib_mthca: Mellanox InfiniBand HCA driver v1.0 (April 4, 2008)
[ 5.365223] ib_mthca: Initializing 0000:07:00.0
[ 5.365241] ib_mthca 0000:07:00.0: enabling device (0000 -> 0002)
[ 8.075911] ib_mthca 0000:07:00.0: HCA FW version 1.0.800 is old (1.2.000 is current).
[ 8.075916] ib_mthca 0000:07:00.0: If you have problems, try updating your HCA FW.
[ 7086.562904] ib_mthca 0000:07:00.0 ibp7s0: renamed from ib0

root@gate2:/sys/class/infiniband# ibstat
CA 'ibp7s0'
CA type: MT25204
Number of ports: 1
Firmware version: 1.0.800
Hardware version: a0
Node GUID: 0x0002c9020024c098
System image GUID: 0x0002c9020024c09b
Port 1:
State: Active
Physical state: LinkUp
Rate: 10
Base lid: 3
LMC: 0
SM lid: 1
Capability mask: 0x02590a6a
Port GUID: 0x0002c9020024c099
Link layer: InfiniBand

root@gate2:/# ibstatus
Infiniband device 'ibp7s0' port 1 status:
default gid: fe80:0000:0000:0000:0002:c902:0024:c099
base lid: 0x3
sm lid: 0x3
state: 4: ACTIVE
phys state: 5: LinkUp
rate: 10 Gb/sec (4X SDR)
link_layer: InfiniBand

root@gate2:/sys/class/infiniband# ibhosts
Ca : 0x0002c9020024bfa4 ports 1 "MT25204 InfiniHostLx Mellanox Technologies"
Ca : 0x0002c9020024bfe8 ports 1 "gate1 ibp7s0"
Ca : 0x0002c9020024bf94 ports 1 "MT25204 InfiniHostLx Mellanox Technologies"
Ca : 0x0002c9020024c098 ports 1 "gate2 ibp7s0"

I am using now the old 1.4 version of mstflint.

However, I have to use the -skip_is parameter.

./mstflint -d 07:00.0 -skip_is -i ./fw-25204-1_2_000-MHES14-XTC_A1-A3.bin burn

Warning: memory access to device 07:00.0 failed: Operation not permitted.
Warning: Fallback on IO: much slower, and unsafe if device in use.

Current FW version on flash:  N/A
New FW version:               1.2.0

Read and verify Invariant Sector - DIFF DETECTED

Invariant sector mismatch. Address 0x40 in image: 0x15000720, while on flash: 0x14000720

The invariant sector can not be burnt in a failsafe manner.
You can continue the FW update without burning the invariant sector.
See FW release notes for details on invariant sector updates.

Do you want to continue ? (y/n) [n] : y
Read and verify PPS/SPS on flash - OK
Burning second FW image without signatures - OK
Restoring second signature - OK

Firmware image in question is dated Nov,2007

$ unzip -l fw-25204-1_2_000-MHES14-XTC_A1-A3.bin.zip
Archive:  fw-25204-1_2_000-MHES14-XTC_A1-A3.bin.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
   923152  2007-11-28 01:25   fw-25204-1_2_000-MHES14-XTC_A1-A3.bin
---------                     -------
   923152                     1 file

Sounds like MSTflint doesn't support such old devices anymore.
I did my best to puzzle out when support was removed. No luck. All related engineers left company long time ago.

My advise would be to use MSTflint version for the relevant time frame.