GPU memory pool
ndryden opened this issue · 0 comments
ndryden commented
The NCCL Alltoall
and Alltoallv
operations allocate temporary buffers on the GPU to handle in-place operations. Right now this directly calls cudaMalloc
/cudaFree
. We should use a memory pool, since these operations have a large overhead.
Right now our memory pools only support host memory. Probably should use cub (or similar).