/error_correction

Primary LanguageC++Artistic License 2.0Artistic-2.0

Quake is a package to correct substitution sequencing errors in experiments with deep
coverage (e.g. >15X), specifically intended for Illumina sequencing reads.  Quake
adopts the k-mer error correction framework, first introduced by the EULER genome
assembly package.  Unlike EULER and similar programs, Quake utilizes a robust mixture
model of erroneous and genuine k-mer distributions to determine where errors are
located. Then Quake uses read quality values and learns the nucleotide to nucleotide
error rates to determine what types of errors are most likely. This leads to more
corrections and greater accuracy, especially with respect to avoiding
mis-corrections, which create false sequence dissimilar to anything in the original
genome sequence from which the read was taken.

Quake is freely available from http://www.cbcb.umd.edu/software/quake. Instructions
on how to use Quake can be found at
http://www.cbcb.umd.edu/software/quake/manual.html. Direct questions, concerns, and
feature requests to David Kelley (dakelley@umiacs.umd.edu).

Installation prerequisites:
1. Download and install Boost: http://www.boost.org
2. Open src/Makefile and update CFLAGS to include Boost.  E.g. if you used MacPorts, it may be:
CFLAGS=-O3 -fopenmp -I/opt/local/include -I.
3. Compile Quake by typing "make" in the src directory.
4. Download and install Jellyfish for counting k-mers: http://www.cbcb.umd.edu/software/jellyfish
5. Place a symbolic link to the jellyfish binary in Quake's bin directory, or update the variable "jellyfish_dir" in bin/quake.py to the directory containing the jellyfish binary.
6. Download and install R: http://www.r-project.org
7. Download and install R VGAM library via R command install.packages("VGAM")

Both Boost and R are easily installed via an installation helper e.g. MacPorts (http://www.macports.org) for Mac, and a command such as "sudo port install boost".

To complete the installation, run the command "make" in the src/ directory.