/drep

Rapid comparison and dereplication of genomes

Primary LanguagePython

dRep

dRep is a python program for rapidly comparing large numbers of genomes. dRep can also "de-replicate" a genome set by identifying groups of highly similar genomes and choosing the best representative genome for each genome set.

Manual, installation instructions, and API are at available at ReadTheDocs

Publication is available at ISMEJ

Open source pre-print publication is available at bioRxiv

Installation with pip

$ pip install drep

Quick start

Genome comparison:

$ dRep compare output_directory -g path/to/genomes/*.fasta

Genome de-replication:

$ dRep dereplicate outout_directory -g path/to/genomes/*.fasta

Make sure dependencies are properly installed:

$ dRep check_dependencies

Dependencies

Required

  • Mash is used to rapidly compare all genomes in a pair-wise manner
  • MUMmer is used to perform more actuate comparisons between genomes which are shown to be similar with Mash

Optional

  • CheckM is used to determine the contamination and completeness of genomes (used during de-replication)
  • gANI (aka ANIcalculator) is an optional alternative to MUMmer
  • Prodigal is a dependency of both checkM and gANI

Accessory

  • Centrifuge can be used to perform rough taxonomic assignment of bins

Dependencies

Near Essential

  • Mash - Makes primary clusters (v1.1.1 confirmed works)
  • MUMmer - Performs default ANIm comparison method (v3.23 confirmed works)

Optional

  • fastANI - A fast secondary clustering algorithm
  • CheckM_ - Determines contamination and completeness of genomes (v1.0.7 confirmed works)
  • gANI (aka ANIcalculator) - Performs gANI comparison method (v1.0 confirmed works)
  • Prodigal - Used be both checkM and gANI (v2.6.3 confirmed works)
  • NSimScan - Only needed for goANI algorithm (open source version of gANI)