bsub jobs hang
Closed this issue · 2 comments
jgarofoli commented
this is kind of critical/urgent.
jgarofoli commented
don't even know why they hang at the end. So, more logging/debugging is needed. Not reproduceable in standard mpirun jobs.
jgarofoli commented
no longer hanging, and now should feed back in the log file if there are hangs. Suspect that there are inconsistencies in each ranks list of subjobs. The problem would be that different subjobs try to reduce at the same time and and just wait.....
I'm closing this.