/SIMBA

R package for the simulation of metagenomic samples

Primary LanguageR

SIMBA

Overview

SIMBA is an R package for the Simulation of Microbiome data with Biological Accuracy.

Based on real data, the package simulates new metagenomic data by re-sampling real samples and then implants differentially abundant features. Additionally, the package can simulate confounding factors based on metadata variables in the real data. The simulations are stored in an .h5 file, which is then the basis for downstream benchmarking, involving i) reality assessment of the simulations, ii) testing for differential abundance, and iii) evaluation of the output from differential abundance testing methods.

Installation

SIMBA was build using R version 4.0 and should run on any operating system that supports R. It is available via Github and can be installed via devtools

require("devtools")
devtools::install_github(repo = 'zellerlab/SIMBA')

estimated time for installation on a common desktop computer: 12 seconds

Additionally, the package has been submitted to CRAN under the name simbaR.

Instructions

A typical SIMBA workflow consists of four steps, which are explained in more detail in the vignette, using a toy example:

  1. Using a real dataset, SIMBA simulates data for benchmarking
  2. SIMBA performs a reality assessment of the simulated data
  3. Various differential abundance testing methods are applied to the simulations
  4. The output of the differential abundance testing methods are evaluated

Please see the vignette for more detail.

Additionally, check out the BAMBI repository on Github, which contains scripts for a large benchmarking effort as reported in our preprint.

Feedback and Contact

If you have any question about SIMBA, if you run into any issue, or if you would like to make a feature request, please:

License

SIMBA is distributed under the GPL-3 license.

Citation

If you use SIMBA, please cite us by

Wirbel J, Essex M, Foslund, SK Zeller G Evaluation of microbiome association models under realistic and confounded conditions bioRxiv (2022) https://doi.org/10.1101/2022.05.09.491139