[core][experimental] Support nested and dynamically sized GPU tensors via NCCL

Question

[core][experimental] Support nested and dynamically sized GPU tensors via NCCL

Closed this issue 24 days ago · 2 comments

Description

#45092 introduces statically sized p2p transfer for GPU tensors via NCCL. We should also support GPU tensors that are stored inside other data on host memory, as well as dynamically sized tensors.

Use case

No response

Answer 1 · 2024-05-23T22:06:19.000Z

@stephanie-wang is this something that is needed for integration with vLLM or is this a usability item to make it more natural for NCCL users to convert to ADAG?

Answer 2 · 2024-05-23T22:09:57.000Z

It is both!

A lighterweight integration with vLLM is possible without this, but supporting this would mean that GPU-GPU communication can also be done through DAGs (instead of user code).