gCNV validation

IntensityRankSumAnnotator tool is used to perform in-silico validation of Copy Number Variants (CNVs) in the UKBB dataset using SNP array intensity data.

Table of Contents

Deployment and execution:

Data:

  • Directory path with bed files containing UKBB gCNV output (per chromosome).
    • VCF header template
  • List of UKBB SNP array files in VCF format:
  • List of samples on which to run GenomeStrip IRS

The main scripts to run this analysis are:

  • ukbbValidation.wdl: this workflow reformats SNP array and gCNV data from the UKBB and calls GenomeStrip IRS for in-silico CNV validation.
  • genomeStripIRS.wdl: runs GenomeStrip IRS and can be executed on its own.

Execution

> git clone https://github.com/talkowski-lab/cnv-validation.git
> cd cnv-validation/wdl
> zip dependencies.zip *

> cromshell submit ukbbValidation.wdl /path/to/array-validation.json /path/to/config.json dependencies.zip

Copyright (c) 2022 Talkowski Lab and The Broad Institute of M.I.T. and Harvard
Contact: Alba Sanchis-Juan

SV aggregation team: Ryan Collins, Jack Fu, Isaac Wong, Alba Sanchis-Juan and Harrison Brand on behalf of the Talkowski Laboratory