Workflows used for germline short variant discovery in WGS data
This WDL pipeline implements data pre-processing and initial variant calling (GVCF generation) according to the GATK Best Practices (June 2016) for germline SNP and Indel discovery in human whole-genome sequencing data.
- Human whole-genome paired-end sequencing data in unmapped BAM (uBAM) format
- One or more read groups, one per uBAM file, all belonging to a single sample (SM)
- Input uBAM files must additionally comply with the following requirements:
-
- filenames all have the same suffix (we use ".unmapped.bam")
-
- files must pass validation by ValidateSamFile
-
- reads are provided in query-sorted order
-
- all reads must have an RG tag
- Reference genome must be Hg38 with ALT contigs
- Cram, cram index, and cram md5
- GVCF and its gvcf index
- BQSR Report
- Several Summary Metrics
- GATK 4.0.10.1
- Picard 2.16.0-SNAPSHOT
- Samtools 1.3.1
- Python 2.7
- Cromwell version support
- Successfully tested on v37
- Does not work on versions < v23 due to output syntax
- The provided JSON is meant to be a ready to use example JSON template of the workflow. It is the user’s responsibility to correctly set the reference and resource input variables using the GATK Tool and Tutorial Documentations.
- Relevant reference and resources bundles can be accessed in Resource Bundle.
- Runtime parameters are optimized for Broad's Google Cloud Platform implementation.
- For help running workflows on the Google Cloud Platform or locally please view the following tutorial (How to) Execute Workflows from the gatk-workflows Git Organization.
- The following material is provided by the GATK Team. Please post any questions or concerns to one of our forum sites : GATK , FireCloud or Terra , WDL/Cromwell.
- Please visit the User Guide site for further documentation on our workflows and tools.
Copyright Broad Institute, 2019 | BSD-3 This script is released under the WDL open source code license (BSD-3) (full license text at https://github.com/openwdl/wdl/blob/master/LICENSE). Note however that the programs it calls may be subject to different licenses. Users are responsible for checking that they are authorized to run all programs before running this script.