Species Tree Estimation using Sister Matrices from weighted quartets & triplets with Neighbor Joining algorithm
For Species Tree estimation using FastME (NJ) and sister matrix
- Generate all embedded weighted quartets from a set of gene trees
- Generate the most dominant (i.e. best weighted) quartets from all combinations of quartets
- Generate all embedded weighted triplets from a set of gene trees
- Generate the most dominant (i.e. best weighted) triplets from all combinations of triplets
- Form a sister matrix using the above weighted quartets (S: sister/similarity matrix)
- Form a difference matrix (D) using S i.e. D = 1 - S (element-wise, normalized).
- Run NJ on this D matrix.
- Use DendroPy library
taxa = dendropy.TaxonNamespace()
tree = dendropy.Tree.get_from_path(input_file, "newick", taxon_namespace=taxa, rooting="force-rooted")
# https://dendropy.org/primer/trees.html
for edge in tree.postorder_edge_iter():
edge.length = None
output_tree = tree.as_string("newick").strip()
output_tree = output_tree.replace("[&R] ", "") ## remove this sign
- Needs fastme to be setup and the tool fastme-2.1.5.2-linux64 in the same directory as the required python scripts
- For quartets, need the quartet-controller.sh, summarize_quartets.py and numeric_form_matrix_quartets.py scripts
- For triplets, need the triplet_count.sh, triplet-encoding-controller.sh and numeric_form_matrix_quartets.py scripts
python3 SCRIPTS_For_NJ_quartets/get_NJ_Tree_using_quartets.py "best-wqrts-file" "output-file-name"
python3 SCRIPTS_For_NJ_triplets/compute_NJ_Tree_using_triplets.py "best-wtriplets-file" "output-file-name"
-
Neighbor Joining is computed by the FastME tool.
Lefort, Vincent et al. “FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program.” Molecular biology and evolution vol. 32,10 (2015): 2798-800. doi:10.1093/molbev/msv150
-
SisterEstimation uses some methods of the PhyloNet package for rf computations.
C. Than, D. Ruths, L. Nakhleh (2008) PhyloNet: A software package for analyzing and reconstructing reticulate evolutionary histories, BMC Bioinformatics 9:322.