/gatk4-somatic-cnvs

This repo is archived, these workflows will be housed in the GATK repository under the scripts directory. These workflows are also organized in Dockstore in the GATK Best Practices Workflows collection.

Primary LanguageWDLBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

somatic-cnvs

Purpose :

Workflows for somatic copy number variant analysis.

cnv_somatic_panel_workflow :

Builds a panel of normals (PON) for the cnv pair workflow.

Requirements/Expectations

Important: The normal_bams samples in the json can be used test the wdl, they are NOT to be used to create a panel of normals for sequence analysis. For instructions on creating a proper PON please refer to user the documents Panel of Normals and Generate a CNV panel of normals with CreateReadCountPanelOfNormals .

The reference used must be the same between PoN and case samples.

  • CNVSomaticPanelWorkflow.gatk_docker -- GATK Docker image (e.g., broadinstitute/gatk:latest).
  • CNVSomaticPanelWorkflow.intervals -- Picard or GATK-style interval list. For WGS, this should typically only include the autosomal chromosomes.
  • CNVSomaticPanelWorkflow.normal_bais -- List of BAI files. This list must correspond to normal_bams. For example, ["Sample1.bai", "Sample2.bai"].
  • CNVSomaticPanelWorkflow.normal_bams -- List of BAM files. This list must correspond to normal_bais. For example, ["Sample1.bam", "Sample2.bam"].
  • CNVSomaticPanelWorkflow.pon_entity_id -- Name of the final PoN file.
  • CNVSomaticPanelWorkflow.ref_fasta_dict -- Path to reference dict file.
  • CNVSomaticPanelWorkflow.ref_fasta_fai -- Path to reference fasta fai file.
  • CNVSomaticPanelWorkflow.ref_fasta -- Path to reference fasta file.

In additional, there are optional workflow-level and task-level parameters that may be set by advanced users; for example:

  • CNVSomaticPanelWorkflow.do_explicit_gc_correction -- (optional) If true, perform explicit GC-bias correction when creating PoN and in subsequent denoising of case samples. If false, rely on PCA-based denoising to correct for GC bias.
  • CNVSomaticPanelWorkflow.PreprocessIntervals.bin_length -- Size of bins (in bp) for coverage collection. This must be the same value used for all case samples.
  • CNVSomaticPanelWorkflow.PreprocessIntervals.padding -- Amount of padding (in bp) to add to both sides of targets for WES coverage collection. This must be the same value used for all case samples.

Further explanation of other task-level parameters may be found by invoking the --help documentation available in the gatk.jar for each tool.

Outputs

  • Read count PON in HD5 format
  • Addtional metrics

cnv_somatic_pair_workflow :

Running a matched pair to obtain somatic copy number variants.

Requirements/Expectations

The reference and bins (if specified) must be the same between PoN and case samples.

  • CNVSomaticPairWorkflow.common_sites -- Picard or GATK-style interval list of common sites to use for collecting allelic counts.
  • CNVSomaticPairWorkflow.gatk_docker -- GATK Docker image (e.g., broadinstitute/gatk:latest).
  • CNVSomaticPairWorkflow.intervals -- Picard or GATK-style interval list. For WGS, this should typically only include the autosomal chromosomes.
  • CNVSomaticPairWorkflow.normal_bam -- Path to normal BAM file.
  • CNVSomaticPairWorkflow.normal_bam_idx -- Path to normal BAM file index.
  • CNVSomaticPairWorkflow.read_count_pon -- Path to read-count PoN created by the panel workflow.
  • CNVSomaticPairWorkflow.ref_fasta_dict -- Path to reference dict file.
  • CNVSomaticPairWorkflow.ref_fasta_fai -- Path to reference fasta fai file.
  • CNVSomaticPairWorkflow.ref_fasta -- Path to reference fasta file.
  • CNVSomaticPairWorkflow.tumor_bam -- Path to tumor BAM file.
  • CNVSomaticPairWorkflow.tumor_bam_idx -- Path to tumor BAM file index.

In additional, there are several task-level parameters that may be set by advanced users as above.

To invoke Oncotator on the called tumor copy-ratio segments:

  • CNVSomaticPairWorkflow.is_run_oncotator -- (optional) If true, run Oncotator on the called copy-ratio segments. This will generate both a simple TSV and a gene list.

Further explanation of these task-level parameters may be found by invoking the --help documentation available in the gatk.jar for each tool.

Outputs

  • Modeled segments for tumor and normal
  • Modeled segments plot for tumor and normal
  • Denoised copy ratios for tumor and normal
  • Denoised copy ratios plot for tumor and normal
  • Denoised copy ratios lim 4 plot for tumor and normal
  • Addtional metrics

Software version requirements :

GATK 4.1.4.0 Cromwell version support

  • Successfully tested on v45

Important Notes :

Contact Us :

  • The following material is provided by the Data Science Platforum group at the Broad Institute. Please direct any questions or concerns to one of our forum sites : GATK or Terra.

LICENSING :

Copyright Broad Institute, 2019 | BSD-3 This script is released under the WDL open source code license (BSD-3) (full license text at https://github.com/openwdl/wdl/blob/master/LICENSE). Note however that the programs it calls may be subject to different licenses. Users are responsible for checking that they are authorized to run all programs before running this script.