/VERSO-UTILITIES

Utilities repository for VERSO (Viral Evolution ReconStructiOn); main tool repository: https://github.com/BIMIB-DISCo/VERSO

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

VERSO-UTILITIES

Utilities repository for VERSO (Viral Evolution ReconStructiOn). This repository contains data and scripts to reproduce the analysis on both real-world and simulated datasets presented in the article: https://www.cell.com/patterns/fulltext/S2666-3899(21)00022-2

Note that this is not the repository of the VERSO tool and does not include the related documentation (please refer to: https://github.com/BIMIB-DISCo/VERSO for the tool and the documentation).

REAL-WORLD DATASETS

To reproduce the analyses of Datasets #1 (Amplicon) and #2 (RNA-seq) via VERSO please go the folder named: https://github.com/BIMIB-DISCo/VERSO-UTILITIES/tree/main/REAL-WORLD_DATASETS. Please read the following instruction to perform both VERSO STEP #1 and STEP #2.

VERSO STEP #1

Requirements

if (!require("ape")) install.packages("ape")
if (!require("Rfast")) install.packages("Rfast")

Running

For each dataset, please run the R scripts main.R included in the relative subfolder as follow:

Rscript main.R

Outputs

VERSO STEP #1 returns as output an R list providing the inferred maximum log-likelihood phylogenetic tree.

VERSO STEP #2

Requirements

Running

Please launch Jupyter from the terminal with the following command:

Jupyter notebook

For each dataset, please execute sequentially the Juypyter notebooks included in each related folder and named:

VERSO_STEP_2_DATASET_1.ipynb 
VERSO_STEP_2_DATASET_2.ipynb

Outputs

VERSO STEP #2 returns as output:

  • the SVG images including the UMAP plots related to the distinct clonal genotypes included in the datasets. The file names are numbered according to the clonal genotype ID: C01.svg, C02.svg, etc.
  • the distance among samples, numbered according to the clonal genotype ID: distances_C01.txt, distances_C02.txt, etc.
  • the metadata for each clonal genotype in folders names as: OUTPUT_C01, OUTPUT_C02, etc.

Note that the visualization of the UMAP plots may slightly different due to different package versions.

SYNTHETIC DATASETS

To reproduce the analysis on the 80 simulated datasets via VERSO please go to the folder named: https://github.com/BIMIB-DISCo/VERSO-UTILITIES/tree/main/SYNTHETIC_DATASETS.

In the subfolder results you can find an RData named inference.RData where the simulated data, ground truth and inferred phylogenetic trees for each method are provided. In the scripts main_part1_compute_absolute_error.R and main_part2_compute_phylogenetic_distances.R you can find the code to compute the performance of each method.