/get_homologues

GET_HOMOLOGUES: a versatile software package for pan-genome analysis

Primary LanguagePerlOtherNOASSERTION

GET_HOMOLOGUES

A versatile software package for pan-genome analysis, including both GET_HOMOLOGUES and GET_HOMOLOGUES-EST. It includes algorithms designed for:

  • Clustering coding sequences in homologous (possibly orthologous) groups, on the grounds of sequence similarity. By default GET_HOMOLOGUES compares protein sequences, while GET_HOMOLOGUES-EST aligns nucleotide sequences (CDS or transcripts).
  • Definition of pan- and core-genomes by calculation of overlapping sets of protein or nucleotide sequences.

GET_HOMOLOGUES has been used mostly with bacterial data (see citing papers).

Instead, GET_HOMOLOGUES-EST has been used mostly with plants (see citing papers) and was originally benchmarked with genomes and transcriptomes of Arabidopsis thaliana and Hordeum vulgare and the pan-genomes of Brachypodium distachyon and Brachypodium hybridum (press release).

Installation

Build Status Anaconda-Server Badge DockerHub

Installation instructions, including the bioconda package, are available in the manual and the README.txt file.

Check also the Docker image.

Documentation

Manuals are available at:

version HTML
original, for the analysis of bacterial pan-genomes manual
EST, for the analysis of intra-species eukaryotic pan-genomes manual-est

In addition, there are two tutorials are available:

Citation

The original GET_HOMOLOGUES, suitable for bacterial genomes, was described in:

Contreras-Moreira B, Vinuesa P (2013) Appl. Environ. Microbiol. 79:7696-7701

Vinuesa P, Contreras-Moreira B (2015) Methods in Molecular Biology Volume 1231, 203-232

GET_HOMOLOGUES-EST, adapted to the study of intra-specific eukaryotic pan-genomes and pan-transcriptomes, was described in:

Contreras-Moreira B, Cantalapiedra CP et al (2017) Front. Plant Sci. 10.3389/fpls.2017.00184

Contreras-Moreira B, Rodriguez del Rio A et al (2022) Methods in Molecular Biology https://doi.org/10.1007/978-1-0716-2429-6_9

Credits

GET_HOMOLOGUES is designed, created and maintained at the Computational and Structural Biology group at Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas (EEAD-CSIC) and at the Center for Genomic Sciences of Universidad Nacional Autónoma de México (CCG/UNAM).

The program was written mostly by Bruno Contreras-Moreira and Pablo Vinuesa, with contributions from Carlos P Cantalapiedra, Alvaro Rodríguez del Rio, Rubén Sancho, Roland Wilhelm, David A Wilkinson and many others (see CHANGES.txt). It also includes code and binaries from other authors:

Graphical summary

Legend. Main features of GET_HOMOLOGUES.

Legend. Flowchart and features of GET_HOMOLOGUES-EST.

Related software

GET_PHYLOMARKERS uses twin nucleotide & peptide clusters produced by GET_HOMOLOGUES to compute robust multi-gene and pangenome phylogenies. Check the manual, the tutorial, and the Docker image.

A related piece of software was released in 2023 called GET_PANGENES, which takes FASTA and GFF files as input and explicitely considers gene collinearity by computing whole genome alignments.

Bugs

The code is regularly patched (see CHANGES.txt) in each release. We kindly ask you to report errors or bugs as GitHub issues and to acknowledge the use of the software in scientific publications.

Funding

Fundación ARAID, Consejo Superior de Investigaciones Científicas, DGAPA-PAPIIT UNAM, CONACyT, FEDER, MINECO, DGA-Obra Social La Caixa.

logo CSIC logo ARAID logo UNAM

Badges

GET_HOMOLOGUES is part of the INB/ELIXIR-ES resources portfolio:

logo_ELIXIRES