Simple script to interact with the cromwell server.
Suggestion: add to your ~./bashrc
the following line
alias cromwell="python3 /PATH/TO/FILE/cromwell_interact.py"
making sure the path points to the file
and type source ~/.bashrc
.
From now on you can simply invoke the script using the shortcut cromwell
.
pyperclip: pip install pyperclip
dateutil: pip install python-dateutil
requests: pip install requests
The script is calling with the following syntax:
cromwell [command]
where command is in the list {submit,meta,metadata,outfiles,abort,connect,log}
usage: cromwell_interact.py [-h] [--outpath OUTPATH] [--port PORT] [--http_port HTTP_PORT] {submit,meta,metadata,outfiles,abort,connect,log} ...
Run Cromwell commands from command line
positional arguments:
{submit,meta,metadata,outfiles,abort,connect,log}
help for subcommand
submit submit a job
meta (metadata) Requests metadata and summaries of workflows
outfiles (outfiles)
Prints out content of elems under
log prints the log
optional arguments:
-h, --help show this help message and exit
--outpath OUTPATH Path to wdl script
--port PORT SSH port
--http_port HTTP_PORT Cromwell
To further investigate usage for each command type :
cromwell [command] --help
cromwell connect [...]
usage: cromwell_interact.py connect [-h] server
positional arguments:
server Cromwell server name
optional arguments:
-h, --help show this help message and exit
Use this script to connect to the cromwell server from which metadata is fetched.
usage: cromwell_interact.py submit [-h] --wdl WDL [--inputs INPUTS] [--deps DEPS] [--label LABEL] (--options OPTIONS | --google_labels GOOGLE_LABELS)
optional arguments:
-h, --help show this help message and exit
--wdl WDL Path to wdl script
--inputs INPUTS Path to wdl inputs
--deps DEPS Path to zipped dependencies file
--label LABEL Label of the workflow
--options OPTIONS Workflow option json
--google_labels GOOGLE_LABELS, --l GOOGLE_LABELS
Labels (comma separated key=value list) of the workflow for google. Must contain product at minimum.
This is the command used to submit jobs to the server.
An example would be cromwell submit --wdl project.wdl --inputs project.json --options google_labels.json --label test
When a job is submitted, the info (date, wdl name, wld id, label) is appended at the bottom of a file called workflows.log
.
--inputs
does not need to be specified since the script will automatically look for a .json
file with the same name as the .wdl
, but it can be used to specify a separate input file.
--label
is used to store in the workflows.log
file (see meta
command info)
--deps
is used in case of subworklows
It's required to specify either --options
or --google_labels
as they allows us to monitor usage of resources based on each project.
--options
is a json file where google_label
is the main key followed by other subkyes, one of which must be product
.
Here's a json template to use.
{
"google_labels":{
"project":"your-project",
"product":"your-product"
}
}
Similarly, one can pass the same information on the command line as comma separated key=value pairs:
cromwell submit --wdl project.wdl --options google_labels.json --label test --google_labels projects=your-project,product=your-product
usage: cromwell_interact.py meta [-h] [--file FILE] [--minkeys] [--no_calls] [--summary] [--running] [--failed_jobs] [--summarize_failed_jobs]
[--print_jobs_with_status PRINT_JOBS_WITH_STATUS] [--cromwell_timeout CROMWELL_TIMEOUT]
[id]
positional arguments:
id workflow id
optional arguments:
-h, --help show this help message and exit
--file FILE Use already downloaded meta json file as data
--minkeys Print summary of workflow
--no_calls If don't get call level data. In this way failed jobs can be listed for a workflow with too many rows
--summary, -s Print summary of workflow
--running, -r Print whether it's running or not
--failed_jobs Print summary of failed jobs after each workflow
--summarize_failed_jobs
Print summary of failed jobs over all workflow
--print_jobs_with_status PRINT_JOBS_WITH_STATUS
Print summary of jobs with specific status jobs
--cromwell_timeout CROMWELL_TIMEOUT
Time in seconds to wait for response from cromwell
meta
will store a json file in the tmp
subfolder where the script is found. The json summarizes (based on the request) the metadata info of the cromwell run.
The id
input does not need to be specified as the last one is automatically fetched from the log file.
cromwell meta -r
will produce the following output:
14bf0081-b5fa-4cf9-a9fd-086e772b94cc
curl -X GET "http://localhost:80/api/workflows/v1/14bf0081-b5fa-4cf9-a9fd-086e772b94cc/status" -H "accept: application/json" --socks5 localhost:5000
{"status":"Succeeded","id":"14bf0081-b5fa-4cf9-a9fd-086e772b94cc"}CompletedProcess(args=['curl', '-X', 'GET', 'http://localhost:80/api/workflows/v1/14bf0081-b5fa-4cf9-a9fd-086e772b94cc/status', '-H', 'accept: application/json', '--socks5', 'localhost:5000'], returncode=0, stderr='')
which shows us that the job ran sucessfully.
In order to have a more structured breakdown one can use another flag:
cromwell meta ccbb148d-ab1e-4881-97ed-86ae583e7f20 -s
Metadata saved to /home/petekoti/Dropbox/Projects/CromwellInteract/tmp/ccbb148d-ab1e-4881-97ed-86ae583e7f20.json
Workflow name ldsc_rg
Current status Failed
Start 2021-10-01T14:47:50.648Z
End 2021-10-01T14:52:17.546Z
Call "ldsc_rg.filter_meta"
Basepath gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-filter_meta
job statuses Done:1
Max time: 0.14 minutes, min time 0.14 minutes , average time 0.14 minutes
Max job gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-filter_meta/stdout
Min job gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-filter_meta/stdout
Call "ldsc_rg.munge_ldsc"
Basepath gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-munge_ldsc/
job statuses Done:2
Max time: 0.29 minutes, min time 0.29 minutes , average time 0.29 minutes
Max job gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-munge_ldsc/shard-1/stdout
Min job gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-munge_ldsc/shard-0/stdout
Call "ldsc_rg.gather_h2"
Basepath gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-gather_h2
job statuses Failed_Failed:1
Max time: 3.85 minutes, min time 3.85 minutes , average time 3.85 minutes
Max job gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-gather_h2/stdout
Min job gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-gather_h2/stdout
Total call statuses across subcalls:
Calls for ldsc_rg.filter_meta... Done:1
Calls for ldsc_rg.munge_ldsc... Done:2
Calls for ldsc_rg.gather_h2... Failed_Failed:1
This command will produce metadata for each task, showing where potentially each task failed. One can fetch more info about failed jobs with the failed_jobs
flag:
cromwell meta ccbb148d-ab1e-4881-97ed-86ae583e7f20 --failed_jobs
ccbb148d-ab1e-4881-97ed-86ae583e7f20
curl -X GET "http://localhost:80/api/workflows/v1/ccbb148d-ab1e-4881-97ed-86ae583e7f20/metadata?expandSubWorkflows=false" -H "accept: application/json" --socks5 localhost:5000
Metadata saved to /home/petekoti/Dropbox/Projects/CromwellInteract/tmp/ccbb148d-ab1e-4881-97ed-86ae583e7f20.json
Workflow name ldsc_rg
Current status Failed
Start 2021-10-01T14:47:50.648Z
End 2021-10-01T14:52:17.546Z
Call "ldsc_rg.filter_meta"
Basepath gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-filter_meta
job statuses Done:1
Max time: 0.14 minutes, min time 0.14 minutes , average time 0.14 minutes
Max job gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-filter_meta/stdout
Min job gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-filter_meta/stdout
FAILED JOBS:
No failed jobs!
Call "ldsc_rg.munge_ldsc"
Basepath gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-munge_ldsc/
job statuses Done:2
Max time: 0.29 minutes, min time 0.29 minutes , average time 0.29 minutes
Max job gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-munge_ldsc/shard-1/stdout
Min job gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-munge_ldsc/shard-0/stdout
FAILED JOBS:
No failed jobs!
Call "ldsc_rg.gather_h2"
Basepath gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-gather_h2
job statuses Failed_Failed:1
Max time: 3.85 minutes, min time 3.85 minutes , average time 3.85 minutes
Max job gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-gather_h2/stdout
Min job gs://fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-gather_h2/stdout
FAILED JOBS:
Failed shard# -1
Task ldsc_rg.gather_h2:NA:1 failed. The job was stopped before the command finished. PAPI error code 9. Execution failed: generic::failed_precondition: while running "/cromwell_root/script": unexpected exit status 2 was not ignored
[UserAction] Unexpected exit status 2 while running "/cromwell_root/script": /cromwell_root/script: line 29: syntax error near unexpected token `('
/cromwell_root/script: line 29: `while read f; do echo $f; done < (cat /cromwell_root/fg-cromwell_fresh/ldsc_rg/ccbb148d-ab1e-4881-97ed-86ae583e7f20/call-gather_h2/write_lines_ef8cece4de0c3f7ddfe5ac466c5f26a3.tmp)'
Total call statuses across subcalls:
Calls for ldsc_rg.filter_meta... Done:1
Calls for ldsc_rg.munge_ldsc... Done:2
Calls for ldsc_rg.gather_h2... Failed_Failed:1
Thanks to the output one can be informed of how and where each task failed.
This command is a shortcut to navigate the workflows.log
file that exists in the directory.
usage: cromwell_interact.py log [-h] [--n N] [--kw KW]
optional arguments:
-h, --help show this help message and exit
--n N number of latest jobs to print
--kw KW Search for keyword
cromwell log
will return the last 10 jobs submitted as:
DATE WDL_NAME WDL_ID LABEL
2021-10-05 new 14cf9023-eb3c-49ad-8d34-03734c9b0fcb 1k
2021-10-05 new 1eda5cc3-ea04-4e71-8952-d01f41177f45 1k
2021-10-06 new d0b5590a-c2c2-499d-b4da-7e1c4dbe5679 full-run
2021-10-06 new b8961554-80be-49c1-9367-5255cddbfdd4 full-run
2021-10-06 new 6e8cbd31-9482-4e06-96c1-9926d333bb9e full-run
2021-10-06 new 948c7b3b-2cd3-4691-bc67-4e8b3c60bca0 full-run
2021-10-06 new 17922088-0884-442d-88d7-8db0c9e7ac9c full-run
2021-10-06 new 9186fbb9-47eb-46c5-9b80-19581e768cb3 full-run
2021-10-07 new 1afdb0e4-198b-4ff3-be09-27fdbd547db0
2021-10-07 new 14bf0081-b5fa-4cf9-a9fd-086e772b94cc test_files
One can change the number of lines with --n
.
--kw
allows to search for specific substrings in either the wdl name or label field. In this way one can fetch old specific jobs that share a pattern.
Simple command to terminate a running wdl
usage: cromwell_interact.py abort [-h] id
positional arguments:
id workflow id
In this case the id
argument is required as a safety measure.