Combining cpu and gpu
Closed this issue · 3 comments
Since the package is now compatible with CUDA
. Is it possible to combine cpu and gpu together to get ultimate performance?
There is a similar project in python implemented for quantum simulation using Trotter expansion https://github.com/trotter-suzuki-mpi/trotter-suzuki-mpi
Hope it can be done in Julia!
I'm not sure I understand. Do you mean partitioning a domain such that some subdomains are on CPUs and others on GPUs?
I guess it shouldn't be too hard to do. The only thing is that, not being that familiar with CUDA-aware MPI, I'm not sure how MPI handles communications between CPUs and GPUs. I know I had some issues when sending GPU arrays and receiving into CPU arrays (in PencilArrays.gather
). And I guess these communications would be quite costly, so I'd need to be sure that it's worth the effort...
Sorry for that. I may misunderstand how trotter-suzuki-mpi works, since the comunication between gpu and cpu is quite costly, it may not benfit from a hybrid kernal.
This clearly shows that a hybrid kernal is slower. But I'm not sure whether the hybrid kernal here means distributing FFT between CPU and GPU or distributing Trotter steps into these two.
ref: Calderaro, Luca. "Large-scale Classical Simulation of Quantum Systems Using the Trotter-Suzuki Decomposition."
That looks interesting, thanks! I couldn't find any information on hybrid decompositions on their documentation, but I'll take a look at the paper thesis.