CRITICAL: job status reported by "cromshell list -u" is incorrect and never updates.
dalessioluca opened this issue · 6 comments
cromshell list -u
is supposed to check completion status of all unfinished jobs.
However sometimes it reports incorrect values while cromshell status
reports the correct ones.
Even after running cromshell status
with a specific job id, cromshell list -u
keep listing the old incorrect status.
The implication is that the status reported by cromshell list -u
is unreliable.
This could lead to job keep running silently while the user believe that those job were terminated and therefore this is a critical bug.
I have not figure out how to replicate the problem.
However here there are 8 examples of jobs that are listed as running but are in fact terminated.
Huh.... I wonder why thats happening.
Probably has to do with how the TSV gets updated when you query / update it.
Somewhere in teh status
function the ~/.cromshell/<TSV>
file is updated. That's almost certainly where the problem lies.
Priority of for list -u
in cromshell 2.0 bumped. @bshifaw
You can place this script in your .cromshell directory to check the status of your jobs. It simply runs cromshell status
in a loop.
1 #!/bin/bash
2 cat all.workflow.database.tsv | awk '{print $(NF-2)}' | sort | uniq > id_to_check.txt #check only most current ids
3 # cat all.workflow.database.tsv* | awk '{print $(NF-2)}' | sort | uniq > id_to_check.txt # check all ids
4 lines=$( cat id_to_check.txt )
5
6
7 rm -rf status.txt
8 for job_id in $lines
9 do
10 >-------if [ $job_id != 'WDL_NAME' ]; then
11 >------->-------status=$(cromshell status $job_id | grep "status" )
12 >------->-------echo $job_id $status >> status.txt
13 >-------fi
14 done
15
16 echo "The following jobs are running:"
17 cat status.txt | grep "unning"
The multiple files shouldn't be an issue - it should only be looking in all.workflow.database.tsv
.
I'll take a look at this very soon.