xrsrke/pipegoose

Bucket small tensors and collective operations into larger ones

xrsrke opened this issue a year ago · 0 comments

xrsrke commented a year ago

Similar to https://github.com/facebookresearch/fairscale/blob/164cc0f3170b4a3951dd84dda29c3e1504ac4d6e/fairscale/internal/reduce_scatter_bucketer.py#L74. But we design it in a modular way.

Store tensors in a continuous memory space
Support partitioning a bucket across parallelism dimension
Wait for the bucket fill up and do a distributed operation in a bucket
Move a tensor out of a bucket
Reuse the bucket after flush it