/diffuStats

Several diffusion scores for networks

Primary LanguageR

diffuStats: compute diffusion scores over networks

Travis-CI Build Status codecov.io

Introduction

The general purpose diffuStats R package offers a collection of seven network propagation scores and five graph kernels. Those find application in ubiquitous computational biology applications, being one representative example the propagation of genetic information (e.g. disease-associated genes) in a gene-gene or a protein-protein interaction network. A distinctive feature of diffuStats is the implementation of statistically normalised scores, which address the recurrent question of how would the propagation of a randomised input look. It offers parametric, exact z-scores as well as permutation-based empirical probabilities.

The diffuStats software was published in:

Picart-Armada, S., Thompson, W. K., Buil, A., & Perera-Lluna, A. (2018). diffuStats: an R package to compute diffusion-based scores on biological networks. Bioinformatics, 34(3), 533-534.

General guidelines on how to choose the scores, along with mathematical properties of the normalised and unnormalised scores, were published in:

Picart-Armada, S., Thompson, W. K., Buil, A., & Perera-Lluna, A. (2020). The effect of statistical normalisation on network propagation scores. Bioinformatics, btaa896.

From versions 1.10.2/1.11.2 onwards, diffuStats provides functions to export the exact statistical moments (means and variances), see ?moments. Now the users can characterise the systematic biases in the diffusion scores in their domain of application.

Installation

diffuStats is part of Bioconductor, and can be installed using

BiocManager::install("diffuStats")

For the development version, you can also install the package through R CMD INSTALL or through devtools::install_github("b2slab/diffuStats"), which points to its GitHub repository.

Getting started

diffuStats is suitable for medium-sized networks (thousands of nodes) and is conceived to be used in biological networks. Its limitations come from the kernel formalism: networks exceeding 20k nodes will start requiring large kernel matrices in memory.

Get started by looking at the package vignettes (intro for a quickstart, diffuStats for a complete documentation) or its help

?diffuStats

News

File NEWS.md keeps track of the additions and bug fixes of each package version.