/bonsai

Algorithm for paring trees based on branch support

Primary LanguagePython

Bonsai

Tree pruning using concordance factors and branch lengths

Authors

Gregg Thomas and Jeffrey Good

About

Bonsai offers informed tree pruning to maximize the phylogenetic concordance of the loci underlying a species tree. This can help facilitate analyses on large trees.

The program also offers a suite of alignment and concordance factor calculations.

Installation

Simply download this repository and run the program. You may want to add the bonsai folder to your $PATH variable for ease of use.

The only dependency is Python 3+

Download the repository in one of two ways:

  1. With git
  • Locate the green button labeled <> Code above the repository files on this page.
  • Click it and copy the URL listed in the text field.
  • Open your command prompt on the machine you wish to download bonsai and navigate to where you want to download it.
  • Clone the repository with git clone [URL], where [URL] is what you copied from this page.
  1. As a zip archive
  • Locate the green button labeled <> Code above the repository files on this page.
  • Click it and click Download ZIP and follow your browser's instructions for downloading a file.
  • Make sure the ZIP archive is located where you want the program to be installed and un-zip the archive. This will differ based on your OS but usually involves right clicking or control-clicking and selecting something like Extract.

Usage

  1. Label a species tree:
python bonsai.py -st [species tree file] --labeltree
  1. Calculate concordance factors using input gene tree and exit:
python bonsai.py -st [species tree file] -gt [gene trees file] --cf
  1. Calculate concordance factors using input gene tree and alignments, and calculate alignment statistics and exit:
python bonsai.py -st [species tree file] -gt [gene trees file] -d [directory with alignments in FASTA format] --stats --cf
  1. Use input gene trees to calculate concordance factors and prune the species tree based on them. Do a maximum of 3 iterations of pruning:
python bonsai.py -st [species tree file] -gt [gene trees file] -i 3
  1. Prune the input species tree using the labels already in the tree. Do a maximum of 5 iterations of pruning:
python bonsai.py -st [species tree file] -gt [gene trees file] -i 5 --labels

Options

Option Description
-st A file or string containing a Newick formatted tree as the species tree (REQUIRED).
-gt A file containing one or more Newick formatted trees, one per line, as gene trees (REQUIRED except with --labeltree or --prune).
-o The desired output directory. This will be created for you if it doesn't exist. Default: bonsai-[date]-[time]
-b The lower percentile of branch lengths to consider for paring at each iteration. Must be between 0 and 100. Default: 10
-g The lower threshold of gene concordance factor to consider for paring at each iteration. Must be between 0 and 100. Default: 25.
-i The maximum number of paring iterations to perform before stopping. Default: 3.
-m The maximum number of species (tips) that can be pruned while paring any given single branch. Default: 10.
-p A file containing branches to prune from the input tree(s), one branch per line defined as internal node labels with --labeltree or 2 tips that descend from that branch (e.g. 'spec1 spec2'). Lines in file starting with '#' are ignored.
-e A file containing branches to be exempt from pruning (note that they can still be PARED), one branch per line defined as internal node labels with --labeltree or 2 tips that descend from that branch (e.g. 'spec1 spec2'). Lines in file starting with '#' are ignored
-d A directory of corresponding alignments to calculate site concordance factors on the species tree (optional).
-scf The number of quartets to sample around each branch for sCF calculations.
-n The number of processes that Bonsai should use. Default: 1.
--cf Stop after calculating concordance factors.
--stats When a directory of alignments is given with -scf, set this flag to write out some stats for the alignments.
--labels Set this option to disable concordance factor calculations and use the labels on the input tree for paring. Note that if these labels are NOT concordance factors, you may also need to adjust -g.
--labeltree Just read the species tree from the input, label the internal nodes, print, and exit.
--overwrite Set this to overwrite existing files.
--appendlog Set this to keep the old log file even if --overwrite is specified. New log information will instead be appended to the previous log file.
--version Simply print the version and exit. Can also be called as -version, -v, or --v.
--info Print some meta information about the program and exit. No other options required.
--quiet Set this flag to prevent bonsai from reporting detailed information about each step.

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant Number (DEB 1754096).

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.