broadinstitute/cromshell

"cromshell list" gives incorrect information after a "cromshell abort" command

dalessioluca opened this issue · 3 comments

Sometimes:

  1. I abort a job by their ID
  2. when I run "cromshell list" I see that the job is listed as still running.
  3. if I try to abort the job again cromshell tells me that there is no job running with that ID.

So it seems that "cromshell abort" correctly killed the job on the first try but "cromshell list" is not picking up the up-to-date status for the recently aborted job.

Are you running cromshell list -u or just cromshell list? Without running list with the update flag, it will just print the local cached status of any runs you've submitted. The update flag will poll the cromwell server to get status information for any non-terminated run. You can also run the cromshell status command and it will update the local cached status info as well.

Cromshell doesn't update the local cached status when you run cromshell abort. This command only sends a request to the server to abort a run - it doesn't then wait for the run to be successfully aborted before exiting. Because of this I didn't want to change the local status to aborted unilaterally - I explicitly only have the statuses change when the server tells us new statuses.

For now you can run status on an aborted job to update the termination state (or alternatively list -u).

The submit function has a wait flag (-w) which will wait until the submitted workflow is running - would this be helpful for abort as well?

I should have been more precise.
I run the command cromshell list -u -c.
See the three attached images for:

  1. The output of cromshell list -u -c
    before

  2. The output of 'cromshell abort'
    middle

  3. The output of cromshell list -u -c again
    after

Hmm… that's interesting. It should be updating the status on list -c -u

I'll try to reproduce the issue.