xrsrke/pipegoose

Bucket small tensors and collective operations into larger ones

xrsrke opened this issue · 0 comments

xrsrke commented

Similar to https://github.com/facebookresearch/fairscale/blob/164cc0f3170b4a3951dd84dda29c3e1504ac4d6e/fairscale/internal/reduce_scatter_bucketer.py#L74. But we design it in a modular way.

  • Store tensors in a continuous memory space
  • Support partitioning a bucket across parallelism dimension
  • Wait for the bucket fill up and do a distributed operation in a bucket
  • Move a tensor out of a bucket
  • Reuse the bucket after flush it