/beast-phylonco-paper

Data and analysis scripts to accompany BEAST phylonco

Primary LanguagePython

BEAST phylonco paper

This repository contains data and analysis scripts that accompany the Beast phylonco paper below.

Paper: Chen, K., Moravec, J. C., Gavryushkin, A., Welch, D., & Drummond, A. J. (2022). Accounting for errors in data improves divergence time estimates in single-cell cancer evolution. Molecular biology and evolution, 39(8), msac143.

Beast phylonco software: A BEAST2 package for single-cell phylogenetic analysis of cancer evolution.

Software requirements

Java 8 and BEAST v2.5

We provide a bundled jar version of BEAST2.5 with Phylonco in beast-phylonco.jar, see analysis section.

Python 3 and packages:

DendroPy~=4.5.2
lxml~=4.8.0
matplotlib~=3.4.3
numpy~=1.21.2
seaborn~=0.11.2

R language, tracerR and packages:

ape
expm
ggtree
ggplot2
tools
treeio
TreeSimGM

(Optional) Simulating new GT16 datasets additionally requires Java 16, LPhy and LPhyBeast and Phylonco-LPhyBeast.

See LPhy setup instructions here.

Datasets

Simulated datasets:

Simulated datasets and parameters are in the directories sim1/data to sim7/data.

True simulation parameters are stored in the files *_true.csv or *_true.log and true trees are stored in the files *._true.trees.

Beast analysis XML files are in sub-directories sim1/data/*.xml to sim7/data/*.xml for each dataset.

Real datasets:

Real datasets are available in FASTA format (with GT16 encoding) in E15/data and L86/data.

Beast analysis XML files are in E15/data/*.xml and L86/data/*.xml

Simulating new datasets

Binary datasets:

Go to the sim1/scripts sub-directory

Run simulate_binary.sh

Run python3 binary_xml_transformer.py

GT16 datasets:

Setup instructions: https://github.com/bioDS/beast-phylonco/releases/tag/v0.0.6 and https://linguaphylo.github.io/setup/

Run LPhyBeast with arguments -l <chain length> -r <num repeats> <path to lphy script>

  • chain length: length of the mcmc chain

  • num repeats: number of experimental repeats, e.g. -r 10 for 10 repeats

  • path to lphy script: lphy scripts are in sim3/scripts/*.lphy and sim7/scripts/*.lphy

Example command:

$BEAST_DIR/bin/lphybeast -l 10000000 -r 10 sim7/scripts/gt16_delta_0.lphy

Running the analysis

Running BEAST2:

We provide a bundled jar version of BEAST2 with Phylonco and related packages. This does not require a separate BEAST2 install.

To run the analysis, use java -jar beast-phylonco.jar <path to xml>.

Substitute <path to xml> with the file path to the Beast XML file.

Example command:

java -jar beast-phylonco.jar sim1/data/binary_yule_n30_L400_0.xml

Post-processing:

Beast log stats: from R run mcmc_stats.r (edit "mcmc_path" to point to your beast logs directory).

Beast log viewer: logs can be viewed using Tracer.

Beast tree stats: trees can be summarized using TreeAnnotator that is bundled with Beast software.

Beast tree viewer: trees can be viewed using Figtree or any compatible beast tree visualization software.

Visualizing output

Beast logs:

Beast logs for sim1 to sim6 are available in the sim1/beast to sim6/beast sub-directories on github

Beast logs for sim7, E15 and L86 are available on Google Drive https://drive.google.com/drive/folders/1vQ6xvs3qq4vJtiI7aDjqBP8xPF__VXAH?usp=sharing

Unzip the downloaded beast logs archive .zip inside the dataset directory (e.g., E15 or L86)

Generating figures:

Coverage plots: run python3 plot_coverage.py from the scripts sub-directory.

Tree statistics plots: run python3 plot_tree_stats.py from the scripts sub-directory.

Summary tree plots: run plot_tree_*.py from the scripts sub-directory.

Extra supplementary plots: run python3 plot_*.py from the scripts sub-directory.

Citations

Software and models:

Datasets: