Canu+HISEA

This is modified (Canu+HISEA) assembly pipeline. HISEA is an efficient all-vs-all long read aligner for SMRT sequencing data. Its algorithm is designed to produce highest alignment sensitivity among others. The HISEA program has been integrated in Canu pipeline in order to produce better assemblies. For details of HISEA program, please see HISEA

Canu is a fork of the Celera Assembler, designed for high-noise single-molecule sequencing (such as the PacBio RS II or Oxford Nanopore MinION).

This is modified Canu assembly pipeline (Canu+HISEA) which runs in four steps, similar to originial Canu pipeline:

Detect overlaps in high-noise sequences using HISEA (This is new in our pipeline)
Generate corrected sequence consensus
Trim corrected sequences
Assemble trimmed corrected sequences

The evaluation of HISEA genome assembly is performed using 30X and 50X sub-sampled data extracted from original datasets downloaded from Pacific Biosciences DevNet Datasets. The 30X and 50X coverage datasets were sampled using the utility fastqSample available from the Canu pipeline.

The HISEA configuration files used for 30X Canu assembly pipeline can be downloaded from the table below. For 50X, same configuration file was used with modification of corHiseaSensitivity parameter set to normal.

Genome	Configuration File Links
E.coli	E.coli configuration
S.cerevisiae	S.cerevisiae configuration
C.elegans	C.elegans configuration
A.thaliana	A.thaliana configuration
D.melanogaster	D.melanogaster configuration

A detailed comparison of HISEA with other leading programs can be found in HISEA paper (Nilesh Khiste and Lucian Ilie). Below are some plots showing NGA50 results for Canu+HISEA and Canu+MHAP.

NGA50 Comparisons

Build:

git clone https://github.com/lucian-ilie/Canu_HISEA.git
cd Canu_HISEA/src
make -j <number of threads>

Run:

Brief command line help:

../<achitechture>/bin/canu

Full list of parameters:

../<architecture>/bin/canu -options

HISEA specific parameters for configuration file:

<tag>HiseaBlockSize
Chunk of reads that can fit into 1GB of memory. Combined with memory to compute the size
of chunk the reads are split into.

<tag>HiseaMerSize
Use k-mers of this size for detecting overlaps.

<tag>HiseaMemory
Memory size per block.

<tag>HiseaSensitivity
Either normal, high, or low

Here is an example of a dummy configuration file, config.txt:

corOverlapper=hisea
corHiseaMerSize=16
corHiseaSensitivity=high
corHiseaMemory=200
corHiseaBlockSize=20000
corOvlRefBlockSize=20000
useGrid=0

Learn:

For usage specific to HISEA configuration, please look at our webpage. The quick start will get you assembling quickly, while the tutorial explains things in more detail.

Citation:

If you find Canu+HISEA pipeline useful, please cite the HISEA paper:

N. Khiste, L. Ilie HISEA: HIerarchical SEed Aligner for PacBio data BMC Bioinformatics, 2017