BUG: Crash with MPI on test/MHD/ResistiveAlfvenWave
Closed this issue · 8 comments
Describe the issue:
I tried to duplicate the analysis function used in the test/MHD/ResistiveAlfvenWave test for my own use and noticed it crashed using MPI. I then checked that, on my computer, the test/MHD/ResistiveAlfvenWave also crashes when using mpi. Not sure if this is an issue with MPI lib/linking on my side or a problem with the base code and using MPI_Reduce...
I notice that in the test the method used to access the data is not that recommended in the idefix doc (i.e. it isn't done through the host).
Error message:
------------> idefix_reduce(Analysis)...
------------> ...returned
[cabardes:69838] *** An error occurred in MPI_Reduce
[cabardes:69838] *** reported by process [3531407361,3]
[cabardes:69838] *** on communicator MPI_COMM_WORLD
[cabardes:69838] *** MPI_ERR_ARG: invalid argument of some other kind
[cabardes:69838] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[cabardes:69838] *** and potentially your MPI job)
runtime information:
Standard v2.0
//Location of the mpi library for MPI
MPI_mpi_LIBRARY:FILEPATH=/usr/local/Cellar/open-mpi/4.1.6
Hi @francois-rincon, can you specify which implementation of MPI you're using ?
//Location of the mpi library for MPI
MPI_mpi_LIBRARY:FILEPATH=/usr/local/Cellar/open-mpi/4.1.6
It seems like the code runs (not sure correctly, but not crash anymore) with
#ifdef WITH_MPI
MPI_SAFE_CALL(MPI_Allreduce(MPI_IN_PLACE, &etot, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD));
#endif
instead of
#ifdef WITH_MPI
MPI_Reduce(MPI_IN_PLACE, &etot, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
#endif
I will try to dig into this tomorrow unless Geoff beats me to it, but in the mean time, here's a protip: in GitHub comments, you can use triple bar ticks to denote code blocks, and specify the language to enable syntax colouring. This is useful to share patches such as this one, in a way that's both easy on the eye and programmatically usable:
```patch
diff --git a/test/MHD/ResistiveAlfvenWave/setup.cpp b/test/MHD/ResistiveAlfvenWave/setup.cpp
index 3f92fc0..daa637e 100644
--- a/test/MHD/ResistiveAlfvenWave/setup.cpp
+++ b/test/MHD/ResistiveAlfvenWave/setup.cpp
@@ -19,7 +19,7 @@ void Analysis(DataBlock & data) {
}, Kokkos::Sum<double>(etot));
#ifdef WITH_MPI
- MPI_Reduce(MPI_IN_PLACE, &etot, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
+ MPI_SAFE_CALL(MPI_Allreduce(MPI_IN_PLACE, &etot, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD));
#endif
if(idfx::prank == 0) {
```
which will render as
diff --git a/test/MHD/ResistiveAlfvenWave/setup.cpp b/test/MHD/ResistiveAlfvenWave/setup.cpp
index 3f92fc0..daa637e 100644
--- a/test/MHD/ResistiveAlfvenWave/setup.cpp
+++ b/test/MHD/ResistiveAlfvenWave/setup.cpp
@@ -19,7 +19,7 @@ void Analysis(DataBlock & data) {
}, Kokkos::Sum<double>(etot));
#ifdef WITH_MPI
- MPI_Reduce(MPI_IN_PLACE, &etot, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
+ MPI_SAFE_CALL(MPI_Allreduce(MPI_IN_PLACE, &etot, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD));
#endif
if(idfx::prank == 0) {
That's probably a bug because the official MPI specifications says that MPI_IN_PLACE should only be used for the source of the root process when calling MPI_Reduce.
I suppose that some implementations still work with MPI_IN_PLACE as a source for all the processes, but some others don't.
@glesur so is François' patch an acceptable solution in your opinion ?
It's a bit of an overkill since only rank=0 needs the result, so I'd prefer a solution keeping MPI_Reduce.
I simply noticed Allreduce was used in other bits of analysis code, like in @jeankpf analysis code for the MTI, so suspect this syntax it's already floating here and there...