Deadlock with simple hello-world
pramodk opened this issue · 6 comments
Dear Darshan Team,
I am seeing confusing behavior and I would like to check if I am missing something obvious here. I have seen #559 but not sure if it's the same issue (TBH, I might be wrong as I didn't get time to look into the details):
Here is a quick summary:
- Let's say we have a simple hello-world that is not doing anything useful:
#include <mpi.h>
int main(int argc, char**argv) {
MPI_Init(&argc, &argv);
#ifdef ENABLE_MPI
MPI_File fh;
MPI_File_open(MPI_COMM_WORLD, "test.foo", MPI_MODE_CREATE | MPI_MODE_WRONLY, MPI_INFO_NULL, &fh);
MPI_File_close(&fh);
#endif
MPI_Finalize();
}
- I set up Darshan in a typical way and have everything working with Intel MPI:
module load unstable gcc intel-oneapi-mpi darshan-runtime
mpicxx -g hello_world.cpp -o hello
export DARSHAN_LOG_DIR_PATH=$PWD
DARSHAN_DIR=$(dirname `which darshan-config`)/../
export LD_PRELOAD=$DARSHAN_DIR/lib/libdarshan.so
mpirun ./hello # not-relevant: using mpirun launcher instead of srun for convenience here
produces:
$ ls -l kumbhar_hello_id254834-254834_4-13-5604-15684913809476855937_1.darshan
-r--------+ 1 kumbhar bbp 1635 Apr 13 01:33 kumbhar_hello_id254834-254834_4-13-5604-15684913809476855937_1.darshan
With ROMIO_PRINT_HINTS=1
, we know that Intel-MPI uses NFS as the default ADIO driver:
key = romio_filesystem_type value = NFS:
So, if I force the GPFS driver then the program gets stuck:
export I_MPI_EXTRA_FILESYSTEM_FORCE=gpfs
As another example, let's look at HPE-MPI (MPT) library:
module load unstable gcc hpe-mpi darshan-runtime
mpicxx -g hello_world.cpp -o hello
export DARSHAN_LOG_DIR_PATH=$PWD
DARSHAN_DIR=$(dirname `which darshan-config`)/../
export LD_PRELOAD=$DARSHAN_DIR/lib/libdarshan.so
srun ./hello
this also gets stuck! I see that .darshan_partial
is generated though:
-rw-r-----+ 1 kumbhar bbp 1673 Apr 13 01:36 kumbhar_hello_id255412-255412_4-13-5806-8325685343036153230.darshan_partial
I got confused because I have MPI I/O applications that are working fine. For example, in the above test example, let's enable part of the code that just opens a file using MPI/O:
$ mpicxx -g hello_world.cpp -o hello -DENABLE_MPI=1
and then srun ./hello
finishes! 🤔 (at least for the few times I tried)
Launching DDT on the exe without -DENABLE_MPI=1
, the stack trace for 2 ranks looks like below:
which appears a bit confusing (?). (By the way, I quickly verified MPI_File_write_at_all
works with 0
as count
)
I didn't spend too much time digging into ROMIO or Darshan code. I thought I should first ask here if this is something looks obvious to the developer team or if you have seen this before.
Thank you in advance!
Hi @pramodk , can you try running one of your deadlocking examples with this environment variable set?
export DARSHAN_LOGHINTS=""
It's been a little while since we've encountered this, but it's possible that the ROMIO driver for the file system has a bug that's only triggered when using the hints that Darshan sets when writing the log file.
For a little more background, Darshan sets "romio_no_indep_rw=true;cb_nodes=4" by default. Taken together, they indicate that regardless of how many ranks the application has, only 4 of them will actually open the Darshan log and act as aggregators. This is helpful at scale to keep cost of opening the log file from getting too high.
Out of curiosity, what does DDT say about the location of the first hang you mention (Intel MPI forcing gpfs ADIO)? Maybe it's failing the collective create of the log file since we don't see any evidence of an output log created?
The 2nd hang (MPT) you mention is clearly hanging the very first time Darshan tries to do collective writes to the log file -- log file creation clearly succeeds as you get the .darshan_partial log. Phil's suggestion has sometimes helped with this sort of thing, so that is worth trying.
(just a quick partial response, will answer other questions tomorrow)
can you try running one of your deadlocking examples with this environment variable set?
export DARSHAN_LOGHINTS=""
Yes! I confirm that changing romio_no_indep_rw
via DARSHAN_LOGHINTS
run the program successfully i.e.
below fails
export ROMIO_PRINT_HINTS=1
DARSHAN_LOGHINTS="romio_no_indep_rw=true" srun ./hello
...
+ DARSHAN_LOGHINTS=romio_no_indep_rw=true
+ srun ./hello
key = romio_no_indep_rw value = true
key = cb_buffer_size value = 16777216
key = romio_cb_read value = enable
key = romio_cb_write value = enable
key = cb_nodes value = 2
key = romio_cb_pfr value = disable
key = romio_cb_fr_types value = aar
key = romio_cb_fr_alignment value = 1
key = romio_cb_ds_threshold value = 0
key = romio_cb_alltoall value = automatic
key = ind_rd_buffer_size value = 4194304
key = ind_wr_buffer_size value = 524288
key = romio_ds_read value = automatic
key = romio_ds_write value = automatic
key = cb_config_list value = *:1
key = romio_filesystem_type value = GPFS: IBM GPFS
key = romio_aggregator_list value = 0 2
...
...other errors / deadlock...
...
but below succeeds:
export ROMIO_PRINT_HINTS=1
DARSHAN_LOGHINTS="romio_no_indep_rw=false" srun ./hello
srun ./hello
key = romio_no_indep_rw value = false
key = cb_buffer_size value = 16777216
key = romio_cb_read value = automatic
key = romio_cb_write value = automatic
key = cb_nodes value = 2
key = romio_cb_pfr value = disable
key = romio_cb_fr_types value = aar
key = romio_cb_fr_alignment value = 1
key = romio_cb_ds_threshold value = 0
key = romio_cb_alltoall value = automatic
key = ind_rd_buffer_size value = 4194304
key = ind_wr_buffer_size value = 524288
key = romio_ds_read value = automatic
key = romio_ds_write value = automatic
key = cb_config_list value = *:1
key = romio_filesystem_type value = GPFS: IBM GPFS
key = romio_aggregator_list value = 0 2
Wow, thanks for confirming. If you can share the exact MPI library / version you are using when you fill in more details later, that would be great. This is possibly a vendor bug that should be reported. At worst the hint should just be unsupported, not faulty.
If you would like, you can also configure darshan with the --with-log-hints="..."
configure option so that a different default is compiled in (so that the resulting library is safe to use without having to set the explicit environment variable every time).
Probably related to pmodels/mpich#6408