pytorch/tensordict

[Feature Request] Support specifying group in `isend` and `irecv`

lucifer1004 opened this issue · 3 comments

Motivation

Currently, isend and irecv do not have a group argument, making it impossible to specify the desired communication group when there are multiple groups.

An example is when using a NCCL-based group for GPU communication, while also using a GLOO-based group for CPU communication.

Solution

Add a group argument to isend and irecv, and pass the parameter down to the underlying torch.distributed methods.

Checklist

  • I have checked that there is no similar issue in the repo (required)

Good point will take care of vat today unless you want to do it :)

@vmoens I just submitted a PR but have not added tests yet, since I am not familiar with the test environment settings of this project.

I can help with that! Do you mind if I edit your PR directly?