/five-dollar-genome-analysis-pipeline

Workflows used for germline short variant discovery in WGS data

Primary LanguagewdlBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

five-dollar-genome-analysis-pipeline

Workflows used for germline short variant discovery in WGS data

germline_single_sample_workflow :

This WDL pipeline implements data pre-processing and initial variant calling (GVCF generation) according to the GATK Best Practices (June 2016) for germline SNP and Indel discovery in human whole-genome sequencing data.

Requirements/expectations

  • Human whole-genome paired-end sequencing data in unmapped BAM (uBAM) format
  • One or more read groups, one per uBAM file, all belonging to a single sample (SM)
  • Input uBAM files must additionally comply with the following requirements:
    • filenames all have the same suffix (we use ".unmapped.bam")
    • files must pass validation by ValidateSamFile
    • reads are provided in query-sorted order
    • all reads must have an RG tag
  • Reference genome must be Hg38 with ALT contigs

Outputs

  • Cram, cram index, and cram md5
  • GVCF and its gvcf index
  • BQSR Report
  • Several Summary Metrics

Software version requirements :

  • GATK 4.0.10.1
  • Picard 2.16.0-SNAPSHOT
  • Samtools 1.3.1
  • Python 2.7
  • Cromwell version support
    • Successfully tested on v37
    • Does not work on versions < v23 due to output syntax

Important Note :

  • The provided JSON is meant to be a ready to use example JSON template of the workflow. It is the user’s responsibility to correctly set the reference and resource input variables using the GATK Tool and Tutorial Documentations.
  • Relevant reference and resources bundles can be accessed in Resource Bundle.
  • Runtime parameters are optimized for Broad's Google Cloud Platform implementation.
  • For help running workflows on the Google Cloud Platform or locally please view the following tutorial (How to) Execute Workflows from the gatk-workflows Git Organization.
  • The following material is provided by the GATK Team. Please post any questions or concerns to one of our forum sites : GATK , FireCloud or Terra , WDL/Cromwell.
  • Please visit the User Guide site for further documentation on our workflows and tools.

LICENSING :

Copyright Broad Institute, 2019 | BSD-3 This script is released under the WDL open source code license (BSD-3) (full license text at https://github.com/openwdl/wdl/blob/master/LICENSE). Note however that the programs it calls may be subject to different licenses. Users are responsible for checking that they are authorized to run all programs before running this script.