JLSteenwyk/PhyKIT

Saturation value exceeded the expected range

Dreamening opened this issue · 1 comments

Hi Jacob,

Thank you for developing the versatile tool, Phykit, in facilitating phylogenetic analyses. However, I encountered an issue where the saturation value exceeded the expected range, reaching a value greater than 1. I'm using phykit 1.19.6 version. I have sent the example files demonstrating this issue to your inbox.

Best,
Murphy

Hi Murphy,

Thank you for your inquiry, and I apologize for the lack of clarity in the documentation.

The underlying math of the Saturation function recently slightly changed. In PhyKIT, v1.19.16, the saturation calculation is correct, however, the documentation needed improvement.

Short explanation
Specifically, the data is not saturated when the value reported in PhyKIT equals 1. Any deviation thereof indicates some degree of saturation. The further the value is from 1, the greater the saturation.

Longer explanation
Perhaps it would be helpful to delve a little deeper into what a saturation value equal to 1, less than 1, and greater than 1 means. This explanation will follow Philippe et al.'s manuscript and, specifically, figure 5.

image.
The X-axis is the number of inferred substitutions; these are derived from patristic distances in the provided tree file. The Y-axis is the number of differences in the multiple sequence alignment. Both are measures of distances.

Thus, values equal to 1 indicate a perfect correspondence in the distance values between taxa in the alignment and phylogeny indicating no saturation. Values less than 1 indicate that the multiple sequence alignment underestimates distance due to saturation. Values greater than 1 indicate that the distances in the multiple sequence alignment are _greater) than the distances in the phylogeny.

Examining the phylogenetic tree you provided, there are numerous branch lengths of 0. I suspect that because the distances in tree are small, the distance in the multiple sequence alignment is greater than that inferred from the phylogeny. As a result, you have a saturation value greater than 1. Typically, the inverse is observed, especially among highly divergent species. I suspect that your dataset may have millions of years of divergence, not several hundreds of millions of years of divergence.

I hope this helps! Please let me know if anything is not clear!

best,

Jacob

P.S. Thank you again for using PhyKIT. Please consider checking out our other software as they may be of use to you!