NVIDIA/nccl-tests

P2P performance with nccl-tests vs nvbandwidth

goelayu opened this issue · 1 comments

I would assume that nvbandwidth's device_to_device_memcpy_write_sm test should report somewhat comparable bandwidth to nccl-tests's sendrecv_perf. If that assumption is incorrect, any explanation as to why would be appreciated?

If the assumption is correct, I am seeing vastly different numbers on my setup. 2 H100s connect via NVLink. nvbandwidth reports 260GB/s whereas sendrecv_perf reports somewhere around 80GB/s for the default configuration.

Sweeping message sizes from 4K to 4G, and the bandwidth increases from 80 to 180 GB/s but seems to plateau thereafter.

Did you find any answer to your query?