Mellanox/mstflint

IBM-branded cards rejected

UnitedMarsupials opened this issue · 7 comments

Hello!

I have a pair of FreeBSD-11 servers connected via InfiniBand. The cards work very nicely -- providing 5 times lower latency than the Gigabit Ethernet -- which makes NFS feel like local drive.

The cards are recognized by the ibv_devinfo utility, that comes with the OS, as:

hca_id: mlx4_0
        transport:                      InfiniBand (0)
        fw_ver:                         2.7.700
        node_guid:                      0002:c903:0005:xxxx
        sys_image_guid:                 0002:c903:0005:xxxy
        vendor_id:                      0x02c9
        vendor_part_id:                 26428
        hw_ver:                         0xA0
        board_id:                       IBM0030000009
        phys_port_cnt:                  2
                port:   1
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             InfiniBand

                port:   2
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 1
                        port_lid:               1
                        port_lmc:               0x00
                        link_layer:             InfiniBand

Unfortunately, the mstflint-utilities refuse to talk to them:

# mstflint -d pci0:4:0:0 q
FATAL - Can't find device id.
-E- Cannot open Device: pci0:4:0:0. File exists. MFE_UNSUPPORTED_DEVICE

The mstvpd identifies the card as:

% mstvpd pci0:4:0:0
ID:      FALCON QDR
PN:      46M2200
EC:      A1
SN:      YK50200000xx
V0:      PCIe Gen2 x8
V1:      N/A
YA:      N/A

How do I upgrade its firmware?

Did you try calling it with the usual

# mstflint -d mlx4_0 q
?

MT 26428 should go as high as 2.9.1000, you can try to either get OEM (IBM) original FW, or find out matching Mellanox firmware for QDR cards with A1 revision and force burn it.

Did you try calling it with the usual

# mstflint -d mlx4_0 q
?

I did -- and it did not work. Nor do I think has this method -- rather than specifying the direct PCI "coordinates" -- ever worked for anyone on FreeBSD.

The direct PCI-addressing specification works -- as in "the card is detected". It just is not detected as anything, the utilities know, how to upgrade... And that's a shame, because,as you point out, the hardware is that of the QDR cards, A1 revision...

MT 26428 should go as high as 2.9.1000, you can try to either get OEM (IBM) original FW, or find out matching Mellanox firmware for QDR cards with A1 revision and force burn it.

Yes, I can get the firmware -- my problem is getting mstflint utilities to recognize the card!

I don't understand -- it's been over two months now, is this really such a difficult problem to fix? I would've thought, some additional identifiers (those used by IBM-branded cards) just need to be added to some table somewhere, no?

Hi , which device u have in pci0:4:0:0, [cx3,cx3pro,cx4,cx5, etc..]
which version for mstflint you have , you can run mstflint -v

which device u have in pci0:4:0:0, [cx3,cx3pro,cx4,cx5, etc..]

All I know is already pasted right up there, in my first comment on...

which version for mstflint you have

4.16 is the latest version currently ported to FreeBSD. As I wrote a couple of times above, it does not talk to my cards -- only the mstvpd works well enough to print the attributes.

mstflint -v

I'll try to upgrade the port to the latest version (4.18?) and reattempt. But I doubt it will change anything, unless someone deliberately sought to work on this ticket...

Hi,
it seems your device is CX2
BTW, it is out of the support and end of life
But i see also you have "2.7.700" FW version
the latest version for cx2 is 2.9.1200
can you try mstmcra 0xf0014 --> this will print device ID
anyway , if you want to work with it , i think you need to downgrade mstflint to old versions
any chance why are you still using cx2?