Deadlocks when using non-blocking collectives with OpenMPI
mirenradia opened this issue · 0 comments
mirenradia commented
Several users (including myself) have sometimes encountered deadlocks when using OpenMPI that seems to stem from the non-blocking MPI collectives in the AMRInterpolator
and is resolved by the changes in this commit. However the issue does not always occur and there may be other factors at play.
In my experience the deadlock doesn't seem to occur straight away but rather at the next MPI collective call after the first MPI_Waitall
in MPIContext::asyncEnd()
whether that be in writing an HDF5 file or the next use of the AMRInterpolator
.
I have experienced this problem with OpenMPI 4.0.5.