/workflows

Bioinformatics workflows developed for and used on the St. Jude Cloud project.

Primary LanguageWDLMIT LicenseMIT

Build Status Documentation License: MIT

This repository contains all bioinformatics workflows used on the St. Jude Cloud project. Officially, the repository is in beta — the project is adding workflows as they are developed and put into production.

🏠 Homepage

Getting Started

At the time of writing, all workflows are written in WDL and are tested using Cromwell. We use Oliver to easily interact with the Cromwell server to perform various tasks. Although we do not test outside of Cromwell, we expect that the workflows will work just as well using other runners.

The easiest way to get started is to install bioconda and the run the following commands:

conda create -n workflows-dev -c conda-forge cromwell -y
conda activate workflows-dev
git clone git@github.com:stjudecloud/workflows.git
cd workflows

Any of the workflows in the workflows folder is a good place to start, e.g.

cromwell run workflows/reference/bootstrap-reference.wdl --inputs workflows/reference/inputs.json

Repository Structure

The repository is laid out as follows:

  • bin - Scripts used by Cromwell configuration settings. Add this to $PATH prior to using configurations in conf with Cromwell.
  • conf - Cromwell configuration files created for various environments that we use across our team. Feel free to use/fork/suggest improvements.
  • docker - Dockerfiles used in our workflows. All docker images are published to Docker Hub as a part of our CI and are versioned.
  • tools - All tools we have wrapped as individual WDL tasks.
  • workflows - Directory containing all end-to-end bioinformatics workflows.

Workflows Available

The current workflows exist in this repo with the following statuses:

Name Version Description Specification Workflow Status
RNA-Seq Standard v2.0.0 Standard RNA-Seq harmonization pipeline. Specification Realign BAM Workflow, FastQ Workflow In Production
Build STAR References N/A Build STAR aligner reference files used in RNA-Seq Standard harmonization pipelines. None Workflow In Production
Quality Check Standard v1.0.0 Perform ~10 different QC analyses on a BAM file and compile the results using MultiQC. Specification Workflow In Production
Build FastQ Screen References N/A Build references used in WGS/WES Quality Check pipeline for running FastQ Screen. None Workflow In Production
ESTIMATE v1.0.0 (beta) Runs the ESTIMATE software package on a feature counts file. None Workflow In Development
Calculate Gene Lengths N/A Produces a gene length file from a GTF. None Workflow In Production
Build BWA References N/A Builds reference files used by the BWA aligner. None Workflow In Production
BAM to FastQs v1.0.0 Split a BAM file into read groups, then read 1 FastQs and read 2 FastQs. None Workflow In Production

Author

👤 St. Jude Cloud Team

Tests

Given that this repo is still new, there are no tests. When we add tests, we will update the README.

🤝 Contributing

Contributions, issues and feature requests are welcome!
Feel free to check issues page. You can also take a look at the contributing guide.

📝 License

Copyright © 2020-Present St. Jude Cloud Team.
This project is MIT licensed.