jgarofoli/LCLS-data-summary

bsub jobs hang

Closed this issue · 2 comments

this is kind of critical/urgent.

don't even know why they hang at the end. So, more logging/debugging is needed. Not reproduceable in standard mpirun jobs.

no longer hanging, and now should feed back in the log file if there are hangs. Suspect that there are inconsistencies in each ranks list of subjobs. The problem would be that different subjobs try to reduce at the same time and and just wait.....

I'm closing this.