/mango

chia pet analysis software

Primary LanguageC++

Mango

ChIA-PET Analysis Software

Citation

Novel ChIA-PET analysis method reveals two major classes of chromatin interactions Phanstiel DH, Boyle AP, Heidari N, Snyder MP. In Preparation.

Mango Installation

  1. Mango depends on the following R packages.
  1. hash
  2. Rcpp
  3. optparse

They can be installed throug CRAN. For example to install the package 'hash' open R and type the following

install.packages('hash')
install.packages('Rcpp')
install.packages('optparse')
  1. Mango depends on the following software pacakges which should be installed and included in the system PATH prior to using Mango.
  1. Bowtie (http://bowtie-bio.sourceforge.net)
  2. Bedtools (https://github.com/arq5x/bedtools2)
  3. MACS2 (https://github.com/taoliu/MACS)
  1. Once dependencies are installed Mango can be installed from the command line using the following command.
git clone https://github.com/dphansti/mango.git
R CMD INSTALL --no-multiarch --with-keep.source mango

Features

Mango uses fastq files generated by illumina sequencers to call peaks and interactions from ChIA-PET experiments. Arguments can be passed to Mango either by a configuration file, through the command line, or a combination of both. In cases where arguments at supplied both through the command line and a configuration file the values passed via command line arguments will take precidence.

Usage of Mango

Rscript mango.R [-options]

Example for regular interactions calling

Rscript Mango.R --fastq1 samplename_1.fastq --fastq2 samplename_1.fastq --prefix samplename --argfile argfile.txt
   --chromexclude chrM,chrY --stages 1:5

Example of a argfile

bowtieref         = /path/to/hg19
bedtoolsgenome    = /path/to/human.hg19.genome

Parameters

ALL STAGES

stages
stages of the pipeline to execute. stage can be either a single stage (e.g 1 or a range of stagnes e.g 1:5). default = 1:5
prefix
prefix for all output files. default = mango
outdir
The output direcoroy. default = NULL
bowtieref
genome reference file for bowtie
bedtoolsgenome
bedtools genome file
chrominclude
comma separated list of chromosomes to use (e.g. chr1,chr2,chr3,...). Only these chromosomes will be processed. If NULL all chromosomes with be processed. default = NULL
chromexclude
comma separated list of chromosomes to exclude (e.g. chrM,chrY). If NULL all chromosomes with be processed. default = NULL

STAGE 1 PARAMETERS

linkerA
linker sequence to look for. default = GTTGGATAAG
linkerB
linker sequence to look for. default = GTTGGAATGT
minlength
min length of reads after linker trimming. default = 15
maxlength
max length of reads after linker trimming. default = 25
keepempty
Should reads with no linker be kept (TRUE or FALSE). default = FALSE

STAGE 2 PARAMETERS

shortreads
should bowtie alignments be done using paramter for very short reads (~20 bp). default = TRUE

STAGE 4 PARAMETERS

MACS_qvalue
pvalue cutoff for peak calling in MACS2. default = 0.05
MACS_shiftsize
MACS shiftize. NULL allows MACS to determine it
peakslop
Number of basespairs to extend peaks on both sides. default = 500
peakinput
Name of user supplied peaks file. If NULL Mango will use peaks determined from MACS2 analysis. default = NULL
blacklist
BED file of regions to remove from MACS peaks

STAGE 5 PARAMETERS

distcutrangemin
When Mango determines the self-ligation cutoff this is the minimum distance it will consider. default = 1000
distcutrangemax
When Mango determines the self-ligation cutoff this is the maximum distance it will consider. default = 100000
biascut
Mango exlcudes very short distance PETS since they tend to arise from self-ligation of a single DNA framgent as opposed to interligation of two interacting fragments. To determine this distnce cutoff Mango determines the fraction of PETs at each distance that come from self-ligation and sets the cutoff at the point where the fraction is less than or equal to BIASCUT. default = 0.05
FDR
FDR cutoff for significant interactions. default = 0.01
numofbins
number of bins to use for binomial p-value calculations. default = 50
corrMethod
Method to use for correction of mulitply hypothesis testing. See (http://stat.ethz.ch/R-manual/R-devel/library/stats/html/p.adjust.html) for more details. default = BH
maxinteractingdist
The maximum disance (in basepairs) considered for interaction. default = 1000000
extendreads
how many bp to extend reads towards peak. default = 120
FDR
FDR cutoff for interactions. default = 0.01
minPETS
The minimum number of PETs required for an interaction (applied after FDR filtering). default = 2
reportallpairs
Should all pairs be reported or just significant pairs (TRUE or FALSE). default = FALSE