CorEvol primarily identifies gene-pairs under positive and purifying selection from non-recombinant core gene-clusters of a group of organisms. CorEvol accepts both draft and complete genome sequences in GenBank format. The pipeline extracts the coding sequences from the GenBank format to fasta format in two files, one containing nucleotide sequences and the other containing amino acid sequences. CorEvol uses CD-HIT for orthologous clustering of merged amino acid sequences based upon user defined sequence identity and length coverage thresholds. From all the orthologous gene clusters, the core gene clusters are identified by CorEvol and for each of the core gene cluster a non-redundant multi-fasta file is generated by removing any paralogous sequences. Further, each of these core gene clusters are tested for possible presence of homologous recombination utilizing PhiPack software package. The non-recombinant core-clusters are used to reconstruct the core-genome phylogeny. From each of the non-recombinant core clusters, the ratio of non-synonymous to synonymous substitutions (ω), using the program yn00 is calculated. Finally, CorEvol carry out functional analysis of the genes under positive selection and purifying selection according to their orthology with the functional categories defined by the Clusters of Orthologous Groups (COGs) database.
Developed at CSIR-IICB