all_reduce algo factor for NVLink SHARP In network reductions
OrenLeung opened this issue · 0 comments
OrenLeung commented
Hi nccl team,
I was wondering if the all reduce algo factor to obtain bus bandwidth (2*(n-1)/n
) is still accurate when in network reductions like SHARP or NVLink SHARP is done. With in network reductions, there is only N+1 sends, N+1 receive
instead of 2N sends, 2N receive
If 2*(n-1)/n
is not accurate, what would the proper correction factor be for in network reductions?