LLNL/Aluminum

GPU memory pool

ndryden opened this issue · 0 comments

The NCCL Alltoall and Alltoallv operations allocate temporary buffers on the GPU to handle in-place operations. Right now this directly calls cudaMalloc/cudaFree. We should use a memory pool, since these operations have a large overhead.

Right now our memory pools only support host memory. Probably should use cub (or similar).