LLNL/Aluminum

Early exit for trivial NCCL collectives

ndryden opened this issue · 0 comments

Right now, some NCCL operations will exit early when their count parameter is 0. However, non-blocking collectives still set up CUDA event synchronization between streams, which will add some overhead.

For some collectives, we may also be able to exit early if the size of the communicator is 1.