get_homologues: A Perl repository from eead-csic-compbio

GET_HOMOLOGUES

A versatile software package for pan-genome analysis, including both GET_HOMOLOGUES and GET_HOMOLOGUES-EST. It includes algorithms designed for:

Clustering coding sequences in homologous (possibly orthologous) groups, on the grounds of sequence similarity. By default GET_HOMOLOGUES compares protein sequences, while GET_HOMOLOGUES-EST aligns nucleotide sequences (CDS or transcripts).
Definition of pan- and core-genomes by calculation of overlapping sets of protein or nucleotide sequences.

GET_HOMOLOGUES has been used mostly with bacterial data (see citing papers).

Instead, GET_HOMOLOGUES-EST has been used mostly with plants (see citing papers) and was originally benchmarked with genomes and transcriptomes of Arabidopsis thaliana and Hordeum vulgare and the pan-genomes of Brachypodium distachyon and Brachypodium hybridum (press release).

Installation
Documentation
Citation
Credits
Graphical summary
Related software
Bugs
Funding
Badges

Installation

Installation instructions, including the bioconda package, are available in the manual and the README.txt file.

Check also the Docker image.

Documentation

Manuals are available at:

version	HTML
original, for the analysis of bacterial pan-genomes	manual
EST, for the analysis of intra-species eukaryotic pan-genomes	manual-est

In addition, there are two tutorials are available:

Pangenome analysis of plant transcripts and coding sequences, published in 2022.
From genomes to pangenomes: understanding variation among individuals and species, which includes step by step instructions for both bacterial and plant data, first released in 2017.

Citation

The original GET_HOMOLOGUES, suitable for bacterial genomes, was described in:

Contreras-Moreira B, Vinuesa P (2013) Appl. Environ. Microbiol. 79:7696-7701

Vinuesa P, Contreras-Moreira B (2015) Methods in Molecular Biology Volume 1231, 203-232

GET_HOMOLOGUES-EST, adapted to the study of intra-specific eukaryotic pan-genomes and pan-transcriptomes, was described in:

Contreras-Moreira B, Cantalapiedra CP et al (2017) Front. Plant Sci. 10.3389/fpls.2017.00184

Contreras-Moreira B, Rodriguez del Rio A et al (2022) Methods in Molecular Biology https://doi.org/10.1007/978-1-0716-2429-6_9

Credits

GET_HOMOLOGUES is designed, created and maintained at the Computational and Structural Biology group at Estación Experimental de Aula Dei, Consejo Superior de Investigaciones Científicas (EEAD-CSIC) and at the Center for Genomic Sciences of Universidad Nacional Autónoma de México (CCG/UNAM).

The program was written mostly by Bruno Contreras-Moreira and Pablo Vinuesa, with contributions from Carlos P Cantalapiedra, Alvaro Rodríguez del Rio, Rubén Sancho, Roland Wilhelm, David A Wilkinson and many others (see CHANGES.txt). It also includes code and binaries from other authors:

OrthoMCL v1.4, PubMed:12952885)
mcl v14-137, PubMed=11917018)
COGtriangles v2.1, PubMed=20439257)
NCBI Blast-2.16.0+, PubMed=9254694,20003500
BioPerl v1.5.2, PubMed=12368254)
HMMER 3.1b2
Pfam, PubMed=19920124)
PHYLIP 3.695
Transdecoder r20140704, PubMed=23845962)
MVIEW 1.60.1, PubMed=9632837)
diamond 0.8.25, PubMed=25402007)

Graphical summary

Related software

GET_PHYLOMARKERS uses twin nucleotide & peptide clusters produced by GET_HOMOLOGUES to compute robust multi-gene and pangenome phylogenies. Check the manual, the tutorial, and the Docker image.

A related piece of software was released in 2023 called GET_PANGENES, which takes FASTA and GFF files as input and explicitely considers gene collinearity by computing whole genome alignments.

Bugs

The code is regularly patched (see CHANGES.txt) in each release. We kindly ask you to report errors or bugs as GitHub issues and to acknowledge the use of the software in scientific publications.

Funding

Fundación ARAID, Consejo Superior de Investigaciones Científicas, DGAPA-PAPIIT UNAM, CONACyT, FEDER, MINECO, DGA-Obra Social La Caixa.

Badges

GET_HOMOLOGUES is part of the INB/ELIXIR-ES resources portfolio:

eead-csic-compbio/get_homologues