/CutoffOpt

Prior-knowledge-based cutoff optimization for correlation networks (supplementary code for doi:10.1038/s41467-020-18675-3)

Primary LanguageR

CutoffOpt

This repository contains R code to replicate the findings from: Benedetti et al. "A strategy to incorporate prior knowledge into correlation network cutoff selection", Nature Communications (2020). https://www.nature.com/articles/s41467-020-18675-3

Requirements and Installation

Click to expand

Hardware Requirements

The code in this repository requires only a standard computer with enough RAM to support the in-memory operations.

Software Requirements

This code was created with R version 4.0.1 and Rstudio version 1.3.959 and tested on macOS (Catalina 10.15.1).

Cloning the Repository from GitHub

In order to clone this repository, we recommend to use Git. This will only take a few seconds on a personal laptop.

git clone https://github.com/krumsieklab/CutoffOpt

License

This code is released under GPL-3.0 license.

Files

The repository includes the following code files:

  • GlycomicsResults.R -> source this script to reproduce the main glycomics results
  • TranscriptomicsResults.R -> source this script to reproduce the main transcriptomics results
  • TranscriptomicsNetworks.Rmd -> this script can be run after the previous one to interactively browse through the transcriptomics network results
  • HelperFunctions.R -> this script contains the functions used in the analysis scripts above

Result Replication

Glycomics Results

Sourcing the script GlycomicsResults.R will reproduce the main glycomics results. The preprocessed IgG glycomics data will automatically be downloaded from the figshare repository and the following files will be generated in the working directory (each file corresponds to the respective paper figure panel):

  • Figure3B.pdf
  • Figure3C.pdf
  • Figure4A.pdf
  • Figure4B.pdf
  • Figure4C.pdf

Note: The current version of the code performs nboot=10 bootstraps to compute the confidence intervals of all plots, while in the paper we used nboot=1000. Increasing the number will substantially increase the runtime, which for nboot=10 is roughly 3 hours on a MacBook Pro (macOS version 10.15.1) with a 2.3 GHz Quad-Core Intel Core i5 processor and 16GB of RAM. Since in the paper figures we typically report the average across the bootstrapping results, results obtained with the default nboot value given here will not be as smooth as the ones reported in the paper, but will be qualitatively the same.

Metabolomics Results

The metabolomics datasets used in the paper is not publicly available due to study participant privacy policies. Data can be obtained upon request (see paper for details).

Transcriptomics Results

Sourcing the file TranscriptomicsResults.R will reproduce the main transcriptomics results. Using the default settings, the script will download the following precomputed files from figshare:

  • STRING adjacency
  • CORUM adjacency
  • Reactome pathway annotations
  • Preprocessed TCGA PANCAN12 RNA-seq data
  • Precomputed bootstrapping results

Moreover, the script will automatically generate the following files in the working directory (each file corresponds to the respective paper figure panel):

  • Figure6.pdf

  • Figure7A.pdf

  • Figure7B.html

  • Figure7C.html

  • Figure7D.html

  • Figure7E.html

  • SupplementaryData1.html -> corresponding to Supplementary Data 1 (interactive version of Figure 6)

  • SupplementaryFigure8.pdf -> corresponding to Supplementary Figure 8

  • TranscriptomicsNetworkData.Rds -> This data file is needed for the R-Markdown file TranscriptomicsNetworks.Rmd, which will create an interactive Shiny app to explore the optimization and network results for all significant transcriptomics pathways.

Notes on precomputed data: In order to circumvent the long computation time necessary to regenerate the transcriptomics results from scratch, the current version of the TranscriptomicsResults.R script loads precomputed versions of the biological references (STRING, CORUM and pathway annotations from Reactome), of the TCGA PANCAN12 RNA-seq data corrected for covariates, as well as precomputed versions of the bootstrapping results. All these files will be automatically downloaded from figshare upon code sourcing, as they were too large to be included in this repository. Using these precomputed files allows to generate the figures listed above in roughly 2 minutes on a MacBook Pro (macOS version 10.15.1) with a 2.3 GHz Quad-Core Intel Core i5 processor and 16GB of RAM.

To download and generate all data from scratch with nboot=100 as in the paper, the user needs to set use_precomputed=F at line 75 and source the script. This will take roughly 2.5 days on the above-mentioned machine.