/MatrixEval2017

Evaluating access and memory usage of different matrix types

Primary LanguageTeX

Evaluating the performance of different matrix types

Overview

This repository contains code for the paper beachmat: a Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types by Lun et al. (2018).

The provided code will check the performance of different matrix types for row/column access, using simulated and real data sets. To run the tests on your machine, please read the following instructions.

Setup

  1. Install beachmat from Bioconductor.
  2. Enter timings and run R CMD INSTALL --clean package. This requires installation of RcppArmadillo and RcppEigen.

Running simulations

  • timings/ contains scripts for timings (in milliseconds) for accessing data from different matrix representations.
  • timings/chunking/ contains scripts for timing rechunking, as well as checking the chunk cache logic.
  • memory/ contains scripts for memory usage for different matrix representations.
  • miscellaneous contains scripts to compare timings to R, and to verify the no-copy access method of RcppArmadillo and RcppEigen.

Running real analyses

Zeisel dataset

Enter real/zeisel and download the count matrix for the Zeisel data set.

  • Execute the zeisel_time.R script to generate timings (in milliseconds) for matrix access to this data. This will also determine memory usage for each matrix representation.
  • Execute the detection_stats.R script to generate timings (in milliseconds) for computing various cell- or gene-based statistics from this data.

10X dataset

Enter real/10X and install TENxBrainData. Read the README.md file for order of evaluation of the various Rmarkdown scripts.