NVIDIA/nccl-tests

all_reduce algo factor for NVLink SHARP In network reductions

OrenLeung opened this issue · 0 comments

Hi nccl team,

I was wondering if the all reduce algo factor to obtain bus bandwidth (2*(n-1)/n) is still accurate when in network reductions like SHARP or NVLink SHARP is done. With in network reductions, there is only N+1 sends, N+1 receive instead of 2N sends, 2N receive If 2*(n-1)/n is not accurate, what would the proper correction factor be for in network reductions?

image

image