/long-read-catalog

catalog for long-read sequencing tools

long-read-catalog

This repository intends to be a collective effort towards the construction of a useful catalog of tools appropriate to use on long-read sequencing data.

Long-Read (General)

QA/QC and Trimming

  • Filtlong - quality filtering tool for long reads

Mapping

  • minimap 2 - A fast sequence mapping and alignment program

Assembly

Comparison available here.

  • Unicycler - hybrid assembly pipeline for bacterial genomes (miniasm and spades with racon and pilon for polishing)
  • miniasm - Ultrafast de novo assembly for long noisy reads (though having no consensus step)
  • canu - A single molecule sequence assembler for genomes large and small
  • Flye - Fast and accurate de novo assembler for single molecule sequencing reads

Polishing

  • Racon - Ultrafast consensus module for raw de novo genome assembly of long uncorrected reads.

Structural Variation

  • Sniffles - structural variation caller using third generation sequencing (PacBio or Oxford Nanopore)

Read Simulation

Comparison available here

  • Badread - a long read simulator that can imitate many types of read problems
  • LongISLND - Long In silico Sequencing of Lengthy and Noisy Datatypes
  • SiLiCO - a simulator of long read sequencing in pacbio and oxford nanopore

Misc

  • MetaMaps - simultaenously carries out read assignment and sample composition estimation
  • krocus - MLST from long reads
  • ngmlr - long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations
  • MEGAN-LR - long-read and contig binning algorithm
  • MAIRA - real-time taxonomic and functional analysis of long reads on a laptop

Oxford Nanopore

QA/QC and Trimming

  • NanoPlot - Plots and Statistics for quality evaluation
  • NanoQC - Quality control tools for long read sequencing data aiming to replicate some of the plots made by fastQC
  • NanoStat - Simple statistic summary
  • Porechop - adapter trimmer and demultiplexer for Oxford Nanopore reads (abandonware since Oct 2018)
  • poretools - a toolkit for working with Oxford nanopore data
  • MinIONQC - fast and simple quality control for MinION sequencing data
  • nanofilt - Filtering and trimming of long read sequencing data

Basecalling

Comparison available here.

  • Albacore - Oxford Nanopore's official command-line basecaller (requires account)
  • Guppy - GPU basecaller (in development)
  • Scrappie - research basecaller
  • DeepNano - developed by Vladimír Boža and colleagues at Comenius University
  • Chiron - third-party basecaller developed by Haotian Teng and others in Lachlan Coin's group at the University of Queensland

Demultiplexing

  • Deepbinner - a signal-level demultiplexer for Oxford Nanopore reads
  • Porechop - adapter trimmer and demultiplexer for Oxford Nanopore reads (abandonware since Oct 2018)
  • Albacore - Oxford Nanopore's official command-line basecaller (requires account)
  • qcat - Python command-line tool for demultiplexing Oxford Nanopore reads from FASTQ files

Polishing

  • nanopolish - Software package for signal-level analysis of Oxford Nanopore sequencing data.
  • npScarf - scaffolds and completes draft genomes assemblies (spades) in real-time with Oxford Nanopore sequencing.

Structural Variation

  • nanoSV - SV caller for long-read (only tested in nanopore)

Read Simulation

  • NanoSim - Nanopore sequence read simulator

Misc

  • nanocomp - Comparison of multiple long read datasets

PacBio

QA/QC and Trimming

  • proovread - PacBio hybrid error correction through iterative short read consensus

Demultilexing

  • lima - Lima, the PacBio barcode demultiplexer, is the standard tool to identify barcode sequences in PacBio single-molecule sequencing data.

Alignment

  • blasr - The PacBio® long read aligner

Read Simulation

  • PBSIM - This is an updated mirror of the original PacBio Read Simulator

Misc

  • pb-jelly - highly automated pipeline that aligns long sequencing reads (such as PacBio RS reads or long 454 reads in fasta format) to high-confidence draft assembles.