jipolanco/PencilFFTs.jl

Combining cpu and gpu

Closed this issue · 3 comments

Since the package is now compatible with CUDA. Is it possible to combine cpu and gpu together to get ultimate performance?
There is a similar project in python implemented for quantum simulation using Trotter expansion https://github.com/trotter-suzuki-mpi/trotter-suzuki-mpi

Hope it can be done in Julia!

I'm not sure I understand. Do you mean partitioning a domain such that some subdomains are on CPUs and others on GPUs?

I guess it shouldn't be too hard to do. The only thing is that, not being that familiar with CUDA-aware MPI, I'm not sure how MPI handles communications between CPUs and GPUs. I know I had some issues when sending GPU arrays and receiving into CPU arrays (in PencilArrays.gather). And I guess these communications would be quite costly, so I'd need to be sure that it's worth the effort...

Sorry for that. I may misunderstand how trotter-suzuki-mpi works, since the comunication between gpu and cpu is quite costly, it may not benfit from a hybrid kernal.

image
image

This clearly shows that a hybrid kernal is slower. But I'm not sure whether the hybrid kernal here means distributing FFT between CPU and GPU or distributing Trotter steps into these two.

ref: Calderaro, Luca. "Large-scale Classical Simulation of Quantum Systems Using the Trotter-Suzuki Decomposition."

That looks interesting, thanks! I couldn't find any information on hybrid decompositions on their documentation, but I'll take a look at the paper thesis.