Calculating cophenetic pairwise tip distances from phylogenies can be slow. treeclust
provides tools to quickly determine
distances, perform hierarchical cluster, and membership assignment.
There are a couple of software dependencies that are required:
Python
, version 3.6 or aboveBiopython
, version 1.63 or abovegcc
, version 4.8.2 or abovegfortran
, version 4.8.2 or above
Recommended method of installation is via pip:
pip3 install --user git+https://github.com/scwatts/treeclust.git
import pathlib
import Bio.Phylo.NewickIO
import treeclust
# Read in tree, assuming only single tree in file
input_phylo_fp = pathlib.Path('my_phylo.txt')
with input_phylo_fp.open('r') as fh:
tree = next(Bio.Phylo.NewickIO.parse(fh))
# Run pairwise distance calculation and get clusters
distances, tips = treeclust.copheneticd(tree)
clustering = treeclust.hclust(distances, tree.count_terminals(), 3)
# Assign tip membership by cutting clustering dendrogram at a specific height
membership = treeclust.cuttree(clustering, 1.5)