/usher

Ultrafast Sample Placement on Existing Trees

Primary LanguageC++MIT LicenseMIT

Ultrafast Sample Placement on Existing Trees (UShER)

License Build Status install with biocondaEuropean Galaxy server Published in Nature Genetics Published in MBE Published in Bioinformatics Published in Nature DOI

NEW: We will now be sharing and updating UShER's pre-processed mutation-annotated tree object for public SARS-CoV-2 sequences here: http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/. We recommend using https://cov2tree.org/?treenomeEnabled=true (developed by Theo Sanderson and Alexander Kramer) to visualize this tree and its genotypes.

UShER is now a package consisting of a family of programs for rapid phylogenetic analyses, particularly suitable for the SARS-CoV-2 genomes.

  • UShER is a program that rapidly places new samples onto an existing phylogeny using maximum parsimony. It is particularly helpful in understanding the relationships of newly sequenced SARS-CoV-2 genomes with each other and with previously sequenced genomes in a global phylogeny. This has emerged as an important challenge during the COVID-19 pandemic for enabling genomic contact tracing since the viral phylogeny is already very large (>2M sequences) and is expected to grow by many fold in the coming months. UShER is much faster than existing tools with similar functionality and has now also been integrated in the UCSC SARS-CoV-2 Genome Browser, which does not require UShER installation and usage know-how as described below for SARS-CoV-2 applications. If you have sensitive data that cannot be shared over the Internet, consider using ShUShER, developed by Alex Kramer (https://github.com/amkram/shusher), as an alternative to the Genome Browser. UShER uses the mutation-annotated tree (MAT) data format, which is a phylogenetic tree in which the branches are annotated with the mutations that have been inferred to have occurred on them.
  • matUtils is a toolkit for querying, interpreting and manipulating the mutation-annotated trees (MATs). Using matUtils, common operations in SARS-CoV-2 genomic surveillance and contact tracing efforts, including annotating a MAT with new clades, extracting subtrees of the most closely-related samples, or converting the MAT to standard Newick or VCF format can be performed in a matter of seconds to minutes even on a laptop.
  • matOptimize is a program to rapidly and effectively optimize a mutation-annotated tree (MAT) for parsimony using subtree pruning and regrafting (SPR) moves within a user-defined radius.
  • RIPPLES is a program that uses a phylogenomic technique to rapidly and sensitively detect recombinant nodes and their ancestors in a mutation-annotated tree (MAT).

Please refer to our wiki for detailed instructions on installing and using the UShER package.

Acknowledgement

We thank Jim Kent and the UCSC Genome Browser team for allowing us to download the faToVcf utility (from http://hgdownload.soe.ucsc.edu/admin/exe/). Please read the license terms for faToVcf here: https://github.com/ucscGenomeBrowser/kent/blob/master/src/LICENSE.

References

UShER:

  • Yatish Turakhia, Bryan Thornlow, Angie S Hinrichs, Nicola de Maio, Landen Gozashti, Robert Lanfear, David Haussler, and Russ Corbett-Detig, "Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic", Nature Genetics (2021), paper.

matUtils:

  • Jakob McBroome*, Bryan Thornlow*, Angie S. Hinrichs, Alexander Kramer, Nicola De Maio, Nick Goldman, David Haussler, Russell Corbett-Detig, Yatish Turakhia, "A daily-updated database and tools for comprehensive SARS-CoV-2 mutation-annotated trees", Molecular Biology and Evolution (2021), paper.

matOptimize

  • Cheng Ye, Bryan Thornlow, Angie Hinrichs, Alexander Kramer, Cade Mirchandani, Devika Torvi, Robert Lanfear, Russell Corbett-Detig, Yatish Turakhia, "matOptimize: A parallel tree optimization method enables online phylogenetics for SARS-CoV-2", Bioinformatics (2022), paper.

RIPPLES:

  • Yatish Turakhia*, Bryan Thornlow*, Angie S. Hinrichs, Jakob McBroome, Nicolas Ayala, Cheng Ye, Kyle Smith, Nicola De Maio, David Haussler, Robert Lanfear, Russell Corbett-Detig, "Pandemic-Scale Phylogenomics Reveals The SARS-CoV-2 Recombination Landscape", Nature (2022), paper.

For masking recomendations, please also cite:

  • Yatish Turakhia*, Nicola De Maio*, Bryan Thornlow*, Landen Gozashti, Robert Lanfear, Conor R. Walker, Angie S. Hinrichs, Jason D. Fernandes, Rui Borges, Greg Slodkowicz, Lukas Weilguny, David Haussler, Nick Goldman and Russell Corbett-Detig, "Stability of SARS-CoV-2 Phylogenies", PLOS Genetics (2020), paper.
  • Landen Gozashti, Conor R. Walker, Robert Lanfear, Nick Goldman, Nicola De Maio and Russell Corbett-Detig, "Issues with SARS-CoV-2 sequencing data: Updated analysis with data from 4 March 2021", Virological 2021 (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473/15).