microsoft/mscclpp

[Bug] Performance downgrade for allgather kernel 2

Closed this issue · 0 comments

The bus bandwidth for the second all-gather kernel decreased from 230 GB/s to 130 GB/s.

Need to separate out the flushes into a different call to fix it.