[Bug] Performance downgrade for allgather kernel 2
Closed this issue · 0 comments
Binyang2014 commented
The bus bandwidth for the second all-gather kernel decreased from 230 GB/s to 130 GB/s.
Need to separate out the flushes into a different call to fix it.