ray-project/ray

[core][experimental] Support broadcast NCCL ops in accelerated DAG

Opened this issue · 0 comments

Description

When the same GPU tensor is sent to multiple readers, we should use ncclBroadcast under the hood to reduce transfer time.

Use case

No response