/CNNTrees

Primary LanguageJupyter Notebook

Abstract

Each gene has its own evolutionary history which can substantially differ from the evolutionary histories of other genes. Let us take as an example some individual genes or operons can be affected by specific horizontal gene transfer or hybridization events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree, or Tree of Life, that represents the main patterns of vertical descent. Here, we present a new efficient method for inferring single or multiple consensus trees and supertrees for a given set of phylogenetic trees (i.e. additive trees or X-trees). The output of the traditional tree consensus methods is a unique consensus tree or supertree. We show how Machine Learning (ML) models, based on some interesting properties of the Robinson and Foulds topological distance, can be used to partition a given set of trees into one (when the data are homogeneous) or multiple (when the data are heterogeneous) cluster(s) of trees. We used the Accuracy, Precision, Sensitivity, and F1 scores to evaluate the performance of the classification of trees. Special attention is paid to the relevant, but the very challenging, problem of inferring alternative supertrees that are built from phylogenies defined on different, but mutually overlapping, sets of species. The use of the Machine Learning in the Classification of phylogenetic trees makes the new method faster than the existing tree clustering techniques, and thus perfectly suitable for the analysis of large genomic datasets.