/mango

chia pet analysis software

Primary LanguageC++

Mango

ChIA-PET Analysis Software

Citation

Novel ChIA-PET analysis method reveals two major classes of chromatin interactions Phanstiel DH, Boyle AP, Heidari N, Snyder MP. In Preparation.

Mango Installation

  1. Mango depends on the following R packages.
  1. hash
  2. Rcpp
  3. optparse

They can be installed throug CRAN. For example to install the package 'hash' open R and type the following

install.packages('hash')
install.packages('Rcpp')
install.packages('optparse')
  1. Mango depends on the following software pacakges which should be installed and included in the system PATH prior to using Mango.
  1. Bowtie (http://bowtie-bio.sourceforge.net)
  2. Bedtools (https://github.com/arq5x/bedtools2)
  3. MACS2 (https://github.com/taoliu/MACS)
  1. Once dependencies are installed Mango can be installed from the command line using the following command.
git clone https://github.com/dphansti/mango.git
R CMD INSTALL --no-multiarch --with-keep.source mango

Features

Mango uses fastq files generated by illumina sequencers to call peaks and interactions from ChIA-PET experiments. Arguments can be passed to Mango either by a configuration file, through the command line, or a combination of both. In cases where arguments at supplied both through the command line and a configuration file the values passed via command line arguments will take precidence.

Usage of Mango

Rscript mango.R [-options]

Example for regular interactions calling :

Rscript Mango.R --fastq1 samplename_1.fastq --fastq2 samplename_1.fastq --prefix samplename --argfile argfile.txt
   --chromexclude chrM,chrY --stages 1:5

Example of a argfile :

bowtieref         = /path/to/hg19
bedtoolsgenome    = /path/to/human.hg19.genome

Parameters

ALL STAGES

stages

stages of the pipeline to execute. stage can be either a single stage (e.g 1 or a range of stagnes e.g 1:5). default = 1:5

prefix

prefix for all output files. default = mango

outdir

The output direcoroy. default = NULL

bowtieref

genome reference file for bowtie

bedtoolsgenome

bedtools genome file

chrominclude

comma separated list of chromosomes to use (e.g. chr1,chr2,chr3,...). Only these chromosomes will be processed. If NULL all chromosomes with be processed. default = NULL

chromexclude

comma separated list of chromosomes to exclude (e.g. chrM,chrY). If NULL all chromosomes with be processed. default = NULL

STAGE 1 PARAMETERS ~~~~~~~~~~

linkerA

linker sequence to look for. default = GTTGGATAAG

linkerB

linker sequence to look for. default = GTTGGAATGT

minlength

min length of reads after linker trimming. default = 15

maxlength

max length of reads after linker trimming. default = 25

keepempty

Should reads with no linker be kept (TRUE or FALSE). default = FALSE

STAGE 2 PARAMETERS ~~~~~~~~~~

shortreads

should bowtie alignments be done using paramter for very short reads (~20 bp). default = TRUE

STAGE 4 PARAMETERS ~~~~~~~~~~

MACS_qvalue

pvalue cutoff for peak calling in MACS2. default = 0.05

MACS_shiftsize

MACS shiftize. NULL allows MACS to determine it

peakslop

Number of basespairs to extend peaks on both sides. default = 500

peakinput

Name of user supplied peaks file. If NULL Mango will use peaks determined from MACS2 analysis. default = NULL

blacklist

BED file of regions to remove from MACS peaks

STAGE 5 PARAMETERS ~~~~~~~~~~

distcutrangemin

When Mango determines the self-ligation cutoff this is the minimum distance it will consider. default = 1000

distcutrangemax

When Mango determines the self-ligation cutoff this is the maximum distance it will consider. default = 100000

biascut

Mango exlcudes very short distance PETS since they tend to arise from self-ligation of a single DNA framgent as opposed to interligation of two interacting fragments. To determine this distnce cutoff Mango determines the fraction of PETs at each distance that come from self-ligation and sets the cutoff at the point where the fraction is less than or equal to BIASCUT. default = 0.05

FDR

FDR cutoff for significant interactions. default = 0.01

numofbins

number of bins to use for binomial p-value calculations. default = 50

corrMethod

Method to use for correction of mulitply hypothesis testing. See (http://stat.ethz.ch/R-manual/R-devel/library/stats/html/p.adjust.html) for more details. default = BH

maxinteractingdist

The maximum disance (in basepairs) considered for interaction. default = 1000000

extendreads

how many bp to extend reads towards peak. default = 120

FDR

FDR cutoff for interactions. default = 0.01

minPETS

The minimum number of PETs required for an interaction (applied after FDR filtering). default = 2

reportallpairs

Should all pairs be reported or just significant pairs (TRUE or FALSE). default = FALSE