This repository contains an implementation of Phylo2Vec which includes:
cfg/
: Example configuration filesdata/
: Placeholder folder to contain sequence files in FASTA format.examples/
: Example notebooks for different datasetshc/
: Phylogenetic tree optimisation via hill-climbing optimisation- Branch length and nucleotide subsitution model optimisation relies on RAxML-NG
tests/
: Placeholder folder for unit teststrees/
: Placeholder folder to contain tree files as Newick strings.utils/
: Utility functions including definitions of Phylo2Vec and transforms from commonly used tree formats to Phylo2Vec (and vice versa).
A quick demo detailing hill-climbing optimisation with Phylo2Vec is available on the demo.ipynb
notebook.
A more minimalistic demo with an updated defiition of Phylo2Vec is available on Colab:
To reproduce the environment, run:
conda env create -f env.yml
To run hill climbing-based optimisation using Phylo2Vec, run:
conda activate phylo
python -m hc.main
- Download a binary of RAxML-NG at: https://github.com/amkozlov/raxml-ng. For Windows, consider using the Windows Subsystem for Linux.
The following datasets were used:
primates
: https://evolution.gs.washington.edu/book/datasets.htmlfluA
: https://github.com/4ment/phylostan/tree/master/examplesM501
: DS2 dataset in https://github.com/zcrabbit/vbpi-gnn/tree/main/data/hohna_datasets_fastah3n2_na_20
,zika
: https://github.com/neherlab/treetime_examplesyeast
: https://cran.r-project.org/web/packages/phangorn/index.html (comes with pre-loaded datasets includingyeast
)
As mentioned in the submission, we plan to add more optimiation schemes using Phylo2Vec, e.g., MCTS or gradient descent.