R package {bigstatsr} provides functions for fast statistical analysis of large-scale data encoded as matrices. The package can handle matrices that are too large to fit in memory thanks to memory-mapping to binary files on disk. This is very similar to the format big.matrix
provided by R package {bigmemory}, which is no longer used by this package (see the corresponding vignette).
Introduction to package {bigstatsr}
Note that most of the algorithms of this package don't handle missing values.
# For the current development version
devtools::install_github("privefl/bigstatsr")
As inputs, package {bigstatsr} uses Filebacked Big Matrices (FBM).
To memory-map character text files, see package {mmapcharr}.
Please open an issue if you find a bug. If you want help using {bigstatsr}, please post on Stack Overflow with the tag bigstatsr (not yet created). How to make a great R reproducible example?
Package {bigstatsr} uses package {foreach} for its parallelization tasks. Learn more on parallelism with {foreach} with this tuto.
-
Computing the null space of a bigmatrix (works if one dimension is not too large)