GlobalArrays/ga

MPI-PR fails without good error messages when /dev/shm is full

Closed this issue · 1 comments

NWChem crashed a few times and ended filling up /dev/shm. That's not ideal, but I'm less worried about that. What was hard about this is that the error message is useless:

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 28219 RUNNING AT klondike
=   KILLED BY SIGNAL: 7 (Bus error)
===================================================================================

I don't know where ComEx MPI-PR allocates these files but there needs to be error checking there.

All the files in /dev/shm had cmx in the name, which I assume will lead me to the code where they are allocated.

cmx00000010000000025680000001
...
sem.cmx00000010000000025680000000
...

fixed via #254