Early exit for trivial NCCL collectives
ndryden opened this issue · 0 comments
ndryden commented
Right now, some NCCL operations will exit early when their count
parameter is 0. However, non-blocking collectives still set up CUDA event synchronization between streams, which will add some overhead.
For some collectives, we may also be able to exit early if the size of the communicator is 1.