Computational efficiency of some microbiome data science techniques in R

Overview

TreeSummarizedExperiment (TreeSE) and phyloseq (pseq) objects are alternative containers for microbiome data. Here we evaluate their computational efficiency in terms of varying sample and feature set sizes.

Analysis method

Multiple data sets, either in the form of a TreeSE or a phyloseq object, were processed through a few common analytical routines:

The data sets were splitted by taxonomic ranks to get variations in feature counts, while keeping the data set and sample sizes constant. The execution times were measured and recorded for the different methods and sample/feature count combinations.

Benchmarking results

Standard data sets:

Melting
CLR transformation
Agglomeration to Phylum level
Alpha diversity estimation (Shannon)
Beta diversity estimation (Bray-Curtis / MDS)

Big data set:

Melting
CLR transformation
Agglomeration to Phylum level
Alpha diversity estimation (Shannon)

How to run this analysis locally

To reproduce the analyses, start R from within your local copy of this repository and run:

source("main.R")

License