/sbn

Generalizing Phylogenetic Posterior Estimator from MCMC samples via subplit Bayesian networks

Primary LanguageJupyter Notebook

Subsplit Bayesian Networks for Generalizing Phylogenetic Posterior Estimation

Thank you for your interest in our paper: Generalizing Tree Probability Estimation via Bayesian Networks.

Please consider citing the paper when any of the material is used for your research.

@incollection{NIPS2018_7418,
title = {Generalizing Tree Probability Estimation via Bayesian Networks},
author = {Zhang, Cheng and Matsen IV, Frederick A},
booktitle = {Advances in Neural Information Processing Systems 31},
editor = {S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett},
pages = {1449--1458},
year = {2018},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/7418-generalizing-tree-probability-estimation-via-bayesian-networks.pdf}
}

Dependencies

Basic Usage

Load MCMC sample

from utils import summary, mcmc_treeprob
# for golden runs
tree_dict_total, tree_names_total, tree_wts_total = summary(dataname, data_directory)
# for sample runs
tree_dict, tree_names, tree_wts = mcmc_treeprob(path_to_data, 'nexus')

Run SBN

from models import SBN

# parameters to set up the model
#   @taxa is the taxa list of the dataset
#   @emp_tree_freq is the empirical frequency dictionary of the trees, can be left None if kl divergence computation is not required.
model = SBN(taxa, emp_tree_freq)

# parameters to train the model
#   @tree_dict is the unique tree dictionary
#   @tree_names is the name list of the trees
#   @tree_wts is the corresponding frequencies for the trees with names in tree_names

# run sbn-sa
model.bn_train_prob(tree_dict, tree_names, tree_wts)
# run sbn-em
logp = model.bn_em_prob(tree_dict, tree_names, tree_wts, maxiter=200, abstol=1e-05, monitor=True, MAP=False)

Once trained, one can compute the sbn probablities of trees

sbn_est_prob = model.bn_estimate(tree)

When emp_tree_freq is provided, one can evaluate the kl divergence

sbn_kl_div = model.kl_div(method='bn')['bn']

See more detailed examples in the jupyter notebooks.