/CNV-JACG

Primary LanguagePerl

CNV-JACG (v1.1)

CNV-JACG is a random forest based framework for assessing the accuracy of CNVs detected base on paired-end whole genome sequencing data

Getting Started

CNV-JACG is written in perl and R

Prerequisites

perl(v5.18.2) and following perl models

If you encounter error like "perl: symbol lookup error: */perl5/lib/perl5/x86_64-linux-thread-multi//auto/Clone/Clone.so: undefined symbol: Perl_xs_handshake" it means the last two packages were not successfully installed in your current used perl.

R(3.5.1) and the following R package

If you encounter error "Error in library(randomForest) : there is no package called 'randomForest' Execution halted" it indicates the failure of installing the package.

Other required softwares

  • Bedtools(v2.25.0)
  • Samtools(1.3.1)

Installing

wget https://github.com/sunnyzxh/CNV-JACG/archive/v1.1.zip
unzip v1.1.zip
cd CNV-JACG-1.1

Running the test

cd bin
chmod 777 samtools
chmod 777 bedtools
cd ../example
sh test.sh
cd test.result

If "ALL JOBS RUN SUCCESSFULLY!", there should be 10 files under test.result

Note: Since it is a demo, and due to the file size limitation of Github, we only provide a small partial of the following files. For you real data, you need to replace/supply the following files to the genome-wide. You could also contact sunnyzxh@connect.hku.hk to send you the whole files if you can't find the files youself.

  • human reference genome (hg19)
  • maf > 5% SNPs from 1000 Genomes Project
  • Repeat masker, Segmental duplication coordinates

Usage

Please prepare the following inputs

  • The tab-separated file containg the coordinates of putative CNVs, "Chr\tStart\tEnd\tDEL/DUP\tother" ($preCNV)
  • The bam file(s)

Note:

  • Please DO NOT mix Deletion and Duplication within a file, it will lead to using deletion classifier to predict duplication, and vice versa.
  • The bam file should contain the “SA” tag, which could be generated by BWA “mem”

Running

perl CNV-JACG.pl

Function

Assess the accuracy of putative CNVs from illumina pair-end WGS data

Usage

perl CNV-JACG.pl -p example.precnv -b example.bam,example1.bam -r ref.fa -o out

Options

-h|-help            help

-p|-precnv   [s]    putative CNVs file (chr,start,end,type(DEL|DUP),other. tab-separated)

-b|-bam      [s]    bam file(s) (for muliple bam files, simply separated by comma <-b bam1,bam2>)

-r|-ref      [s]    reference genome

-o|-outdir   [s]    output dir

Outputs

  • *het.prob
  • *repeat
  • *feature
  • *RFM.txt
  • *RFM.prediction (This is the final result, containing the prediction result in the first column, and other infomation)

Support

Should you have any question, please feel free to contanct sunnyzxh@connect.hku.hk