Mellanox/nv_peer_memory

ibv_reg_mr_iova2 failed with error bad address

Opened this issue · 2 comments

evn:

  1. ubuntu2204
  2. cuda_12.2.1_535.86.10
  3. MLNX_OFED_LINUX-23.04-1.1.3.0
  4. nccl2_2.18.3
  5. cx7 fireware 28.37.1014
    ib_write_bw is sucessed:

ib_write

nccl topo:
nccl topo

error message:

image

This error is happened on mellox cx7

env problem

Hi @ThkerLee,

I have a similar problem where I need to assign GPU buffer to completion queue in ibv_create_cq. However, it gives me bad address error with GPU address but succeeds with CPU address. Can you please explain how you solved the issue you mentioned?

Many thanks