[core][experimental] Support nested and dynamically sized GPU tensors via NCCL
Closed this issue · 2 comments
stephanie-wang commented
Description
#45092 introduces statically sized p2p transfer for GPU tensors via NCCL. We should also support GPU tensors that are stored inside other data on host memory, as well as dynamically sized tensors.
Use case
No response
anyscalesam commented
@stephanie-wang is this something that is needed for integration with vLLM or is this a usability item to make it more natural for NCCL users to convert to ADAG?
stephanie-wang commented
It is both!
A lighterweight integration with vLLM is possible without this, but supporting this would mean that GPU-GPU communication can also be done through DAGs (instead of user code).