Bucket small tensors and collective operations into larger ones
xrsrke opened this issue · 0 comments
xrsrke commented
Similar to https://github.com/facebookresearch/fairscale/blob/164cc0f3170b4a3951dd84dda29c3e1504ac4d6e/fairscale/internal/reduce_scatter_bucketer.py#L74. But we design it in a modular way.
- Store tensors in a continuous memory space
- Support partitioning a bucket across parallelism dimension
- Wait for the bucket fill up and do a distributed operation in a bucket
- Move a tensor out of a bucket
- Reuse the bucket after flush it