/GenEra

genEra is a fast and easy-to-use command-line tool that estimates the age of the last common ancestor of protein-coding gene families.

Primary LanguageShellGNU General Public License v3.0GPL-3.0

stable DOI Paper link Visitors install with bioconda Downloads

GenEra

Introduction

GenEra is an easy-to-use and highly customizable command-line tool that estimates gene-family founder events (i.e., the age of the last common ancestor of protein-coding gene families) through the reimplementation of genomic phylostratigraphy (Domazet-Lošo et al., 2007).

  • GenEra takes advantage of DIAMOND’s speed and sensitivity to search for homolog genes throughout the entire NR database, and combines these results with the NCBI Taxonomy to assign an origination date for each gene and gene family in a query species.
  • GenEra can also incorporate protein data from external sources to enrich the analysis, it can detect very recently evolved proteins by incorporating different strains os varieties within the same species, it can search for proteins within nucleotide data (i.e., genome/transcriptome assemblies) using MMseqs2 to improve the classification of orphan genes, and it calculates a taxonomic representativeness score to assess the reliability of assigning a gene to a specific age.
  • Additionally, GenEra can calculate homology detection failure probabilities using abSENSE to help distinguish fast-evolving genes from high-confidence gene-family founder events.

As of v1.1.0, users can now use Foldseek to search protein structural predictions against the AlphaFold DB for fast and sensitive structural alignments. Alternatively, the user can choose to perform a reassessment of gene ages by running JackHMMER on top of DIAMOND (be aware, this additional step significantly slows down the analysis).

Precomputed gene ages (or 'phylomaps') made using GenEra or from previous studies using other tools can be found here.

We recommend users to consult the GenEra wiki for details on installation (via Conda or Docker), database setup and how to run GenEra, as well as the output files. We also discuss potential downstream analyses that can be performed on the GenEra output.

Please cite the appropriate tools when using the dependencies of GenEra. These citations are valuable in furthering bioinformatics research.

The paper describing the method implemented in GenEra:

Barrera-Redondo, J., Lotharukpong, J.S., Drost, H.G., Coelho, S.M. (2023). Uncovering gene-family founder events during major evolutionary transitions in animals, plants and fungi using GenEra. Genome Biology, 24, 54. https://doi.org/10.1186/s13059-023-02895-z

Acknowledgement

We (Josué Barrera-Redondo, Jaruwatana Sodai Lotharukpong & Hajk-Georg Drost) would like to thank several individuals for making this project possible.

We gratefully thank Susana M. Coelho, the Max Planck Institute for Biology Tübingen and the Max Planck Society for hosting and facilitating this research. We thank Caroline M. Weisman for her helpful comments on how to analyze and interpret HDF probabilities of her software abSENSE. We thank the Max Planck Computing and Data Facility for access to and support of the HPC infrastructure, as well as the BMBF-funded de.NBI Cloud within the German Network for Bioinformatics Infrastructure (de.NBI) (031A532B, 031A533A, 031A533B, 031A534A, 031A535A, 031A537A, 031A537B, 031A537C, 031A537D, 031A538A).

Lastly, we are very grateful to Alice Laigle, Erica Dinatale, Laura Piovani, Michael Borg, Alexandra Dallaire and all the early adopters for their testing and feedback.

Funding

This work was supported by the European Research Council Grant “THETYS” (Grant agreement ID 864038), the Alexander von Humboldt Foundation, the Gordon and Betty Moore Foundation, and the Max Planck Society.