/w2rap-contigger

An Illumina PE genome contig assembler, can handle large (17Gbp) complex (hexaploid) genomes.

Primary LanguageC++MIT LicenseMIT

w2rap-contigger

An Illumina PE genome contig assembler, can handle large (17Gbp) complex (hexaploid) genomes.

http://bioinfologics.github.io/the-w2rap-contigger/

Installation

Pre-requisites

  • Cmake 2.8.0+
  • GCC 5.2.0+ (you can also use ICC)
  • (Optional) jemalloc or another malloc library (intel's work too).

Compilation instructions

This is a big-ish codebase to compile, so we recomment using the -j flag on make to use multiplr processors (the examples use 4, but more is better).

git clone https://github.com/gonzalogacc/w2rap-contigger.git
cd w2rap-contigger
cmake -D CMAKE_CXX_COMPILER=g++ .  
make -j 4

If you want to link a particular malloc library (jemalloc gives the best performance right now) set the MALLOC_LIBRARY variable during cmake:

git clone https://github.com/gonzalogacc/w2rap-contigger.git
cd w2rap-contigger
cmake -D CMAKE_CXX_COMPILER=g++ -D MALLOC_LIBRARY=<path_to_library.so> .  
make -j 4

Running w2rap-contigger

You need to create a new directory for the intermediate and output files, then run each of the 6 steps sequentially.

Each step requires the path to the output directory and a prefix. Step 01 also requires a path to the input reads (either a bam file or 2 fastq files separated by a ','). Step 02 accepts an optional -K parameter to change the K of the last stage DBG. All steps accept a -t flag to set the thread count for parallel sections, although not all sections will use all processors.

In most systems (specially most NUMA systems), using thread-local allocation will have a positive impact on performance. We are not aware of cases where it produced a significant negative impact, so we recommend setting the MALLOC_PER_THREAD=1 variable.

Example run with input bam file:

mkdir test_k260
export MALLOC_PER_THREAD=1
./01_unipaths -o test_k260 -p example -r example.bam
./02_qgraph -o test_k260 -p example -K 260
./03_clean -o test_k260 -p example
./04_patching -o test_k260 -p example
./05_simplify -o test_k260 -p example
./06_scaffolding -o test_k260 -p example

Example run with input fastq files:

mkdir test_k260
export MALLOC_PER_THREAD=1
./01_unipaths -o test_k260 -p example -r example_r1.fastq,example_r2.fastq
./02_qgraph -o test_k260 -p example -K 260
./03_clean -o test_k260 -p example
./04_patching -o test_k260 -p example
./05_simplify -o test_k260 -p example
./06_scaffolding -o test_k260 -p example

Your assembly will end up in the a.lines.fasta file.