The R implementation of Bias Elimination Algorithm for Deep Sequencing.
❗ RELEASE NOTE ❗
rBEADS is in pre-release (alpha) stage. The software is provided for testing purposes. Please report the problem, bugs, unexpected behaviors and missing features here.
BEADS algorithm requires deep inputs (high reads coverage) to work properly. This means >50 million reads for worm and fly experiments and proportionally higher number for mammalian experiments. It is suggested to pool multiple input experiments using sumBAMinputs
function from rBEADS package.
BEADS is a normalization scheme that corrects nucleotide composition bias, mappability variations and differential local DNA structural effects in deep sequencing data. In high-throughput sequencing data, the recovery of sequenced DNA fragments is not uniform along the genome. In particular, GC-rich sequences are often over-represented and AT-rich sequences under-represented in sequencing data. In addition, the read mapping procedure also generates regional bias. Sequence reads that can be mapped to multiple sites in the genome are usually discarded. Genomic regions with high degeneracy therefore show lower mapped read coverage than unique portions of the genome. Mappability varies along the genome and thus creates systematic bias. Furthermore, local DNA or chromatin structural effects can lead to coverage inhomogeneity of sequencing data.
First, install required BioConductor packages, by running in R:
source("http://bioconductor.org/biocLite.R")
biocLite(c('methods','IRanges','BSgenome','digest','Rsamtools','rtracklayer','GenomicRanges','Biostrings'))
To install the latest development version directly from GitHub, run in R:
if (!require("devtools")) install.packages("devtools")
devtools::install_github("przemol/rbeads")
Run following in R to load the library and see package help:
library(rbeads)
help(rbeads)
R style reference manual (PDF) can be found here.
Following pre-calculated mappabiliti tracks (BigWig files) are avilable ta the moment:
ce10_gem-mappability_36bp.bw
- C. elegans mappability track for 36bp readsdm3_gem-mappability_36bp.bw
- D. melanogaster mappability track for 36bp reads
Human tracks from UCSC:
wgEncodeCrgMapabilityAlign24mer.bigWig
- H. sapiens mappability track for 24bp readswgEncodeCrgMapabilityAlign36mer.bigWig
- H. sapiens mappability track for 36bp readswgEncodeCrgMapabilityAlign40mer.bigWig
- H. sapiens mappability track for 40bp readswgEncodeCrgMapabilityAlign50mer.bigWig
- H. sapiens mappability track for 50bp readswgEncodeCrgMapabilityAlign75mer.bigWig
- H. sapiens mappability track for 75bp readswgEncodeCrgMapabilityAlign100mer.bigWig
- H. sapiens mappability track for 100bp reads
-
Publication describing the BEADS algorithm:
Cheung, M-S., Down, T.A., Latorre, I., and Ahringer, J. (2011) Systematic bias in deep sequencing data and its correction by BEADS Nucleic Acids Research 39(15):e103
-
Original Python/Java implementation: http://beads.sourceforge.net/