/bigstatsr

R package for statistical tools with big matrices stored on disk.

Primary LanguageR

Travis-CI Build Status AppVeyor Build Status Coverage Status CRAN_Status_Badge

bigstatsr

R package {bigstatsr} provides functions for fast statistical analysis of large-scale data encoded as matrices. The package can handle matrices that are too large to fit in memory thanks to memory-mapping to binary files on disk. This is very similar to the format big.matrix provided by R package {bigmemory}, which is no longer used by this package (see the corresponding vignette).

Introduction to package {bigstatsr}

LIST OF FEATURES

Note that most of the algorithms of this package don't handle missing values.

Installation

# For the current development version
devtools::install_github("privefl/bigstatsr")

Input format

As inputs, package {bigstatsr} uses Filebacked Big Matrices (FBM).

To memory-map character text files, see package {mmapcharr}.

Bug report / Help

Please open an issue if you find a bug. If you want help using {bigstatsr}, please post on Stack Overflow with the tag bigstatsr (not yet created). How to make a great R reproducible example?

Use cases

Parallelisation

Package {bigstatsr} uses package {foreach} for its parallelization tasks. Learn more on parallelism with {foreach} with this tuto.

Large datasets