This project aims to provide a simple and lightweight dashboard to analyze the metadata of Cromwell workflows. It was only tested with GCP and AWS backends.
This project is based on Broad Institute gatk-sv script
- High-level statistics (task counts, duration, CPU hours)
- Descriptive statistics about tasks' duration
- Timeline of the entire workflow
- Compute resources used over time
Go to Cromwell server webpage and call /api/workflows/{version}/{id}/metadata
with the following inputs:
id: <workflow_id>
includeKey: id
includeKey: executionStatus
includeKey: backendStatus
includeKey: status
includeKey: callRoot
includeKey: subWorkflowMetadata
includeKey: subWorkflowId
expandSubWorkflows: true
Save the output as metadata.json
If you have cromshell installed, run:
cromshell -t100 metadata <workflow_id> > metadata.json
pip install --upgrade streamlit pandas plotly-express
In most cases there is no need to clone the repo.
streamlit run https://raw.githubusercontent.com/henriqueribeiro/cromwell_profiler/main/profiler.py
git clone https://github.com/henriqueribeiro/cromwell_profiler.git
streamlit run profiler.py
After launching the app, a new page will open on your browser. Just upload the metadata file and the plots will start appearing.
Streamlit sets a maximum file size for uploaded files. If your metadata file is bigger than 200MB do the following:
- Clone the repo
git clone https://github.com/henriqueribeiro/cromwell_profiler.git
- Increase the maximum file size allowed
cd cromwell_profiler
mkdir .streamlit
cat <<EOT >> .streamlit/config.toml
[server]
maxUploadSize=1024
EOT
In this example we are setting the maximum file size to 1024MB
- Re-launch the profiler
streamlit run profiler.py
Please feel free to open PRs with new features, plots, etc