This pipeline demonstrates the ENCODE pipeline reproducibility framework. Any pipeline deployed in this framework can be run on the cloud, on compute clusters with job-submission engines, or on stand-alone machines. It inherently makes use of parallelized/distributed computing. Pipeline installation is simple as most dependencies are automatically installed.
Here we implement a simple bioinformatics pipeline but surround it with all of the ENCODE pipeline reproducibility infrastructure. The bioinformatic task is to use the Trimmomatic software to trim input FASTQs. The output includes the trimmed FASTQ and a plot of FASTQ quality scores before and after trimming. For simplicity this demo supports only single-end FASTQs.
After experimenting with this repo you can create your own fork and use it as a template to deploy your own pipeline, inheriting all of the multi-platform and reproducibility features.
-
Clone this repo and install dependencies:
-
Add single-end FASTQ and Trimmomatic SLIDINGWINDOW parameter (filter reads that drop below average quality score of 30 in two-base window) to
input.json
:
{
"toy.fastqs": [
"test/test_data/file1.fastq.gz"
],
"toy.SLIDINGWINDOW": "2:30"
}
- Run WDL workflow using
input.json
, Cromwell, and Docker backend:
$ java -jar -Dconfig.file=backends/backend.conf cromwell-35.jar run toy.wdl -i input.json -o workflow_opts/docker.json
- Examine output JSON:
{
"outputs": {
"toy.trimmed_output": ["[cromwell/plot/task/execution/path]/trimmed.file1.fastq.gz"],
"toy.plots": ["[cromwell/plot/task/execution/path]/file1_untrimmed_file1_trimmed_quality_scores.png"]
},
"id": "abc123"
}
- Examine quality score plot:
# Mac only
$ open [cromwell/plot/task/execution/path]/file1_untrimmed_file1_trimmed_quality_scores.png