Implement cross-device reductions using Jet

Question

Implement cross-device reductions using Jet

Opened this issue 3 years ago · 0 comments

Feature description

Currently Jet allows the slicing of a given tensor network to allow distribution over multiple devices: each device contracts its part of the network, be they CPU cores (BLAS backed contractions), or GPUs (cuTENSOR backed contractions). Currently no mechanism exists for reductions across different device types. The goal will be to implement a reduction task that allows efficient CPU-GPU, and GPU-GPU contractions.

Context: Enable reductions across mixed devices for tensor network contractions.
Factors: Performance should be equal to or better than a single device alone.

Tasks:

Implement a reduction task for CPU-GPU, and GPU-GPU contractions based on the current main branch.
Add tests to verify the reduction is performed correctly.
Compare runtimes using the CPU-only (default) backend.