High dimensional precision matrix estimation with a known graphical structure
- Works in very high dimensions
- Non-parametric
- Asymptotic regularization 🖖
- Lightning fast
- Both Python and R
Available as header-only in C++, as a Python-package, or as an R-package.
Python:
GraphSPME is available on PyPI:
pip install GraphSPME
R:
Soon:
- R-package on CRAN
In the meantime, follow the developer instructions for installing GraphSPME. Note that GraphSPME relies heavily on Eigen. See dependencies for details.
See GraphSPME-examples for R and Python example code.
We use the AR1 process as an example, as it has known covariance, precision, and graphical structure. Simulate a zero-mean AR1 process with a known graphical structure:
library(Matrix)
# Zero mean AR1
rar1 <- function(n, psi){
x <- numeric(n)
x[1] <- rnorm(1, 0, 1/sqrt(1-psi^2))
for(i in 2:n){
x[i] <- psi*x[i-1] + rnorm(1)
}
return(x)
}
# Simulate data
set.seed(123)
p <- 5
psi <- 0.8
n <- 200
x <- t(replicate(n, rar1(p,psi)))
Z <- bandSparse(p, p, (-1):1,
list(rep(1,p-1),
rep(1,p),
rep(1,p-1)))
The graphical structure of the data is contained in Z
, which shows
the non-zero elements of the precision matrix.
Such information is typically known in real-world problems.
Z
# 5 x 5 sparse Matrix of class "dgCMatrix"
#
# [1,] 1 1 . . .
# [2,] 1 1 1 . .
# [3,] . 1 1 1 .
# [4,] . . 1 1 1
# [5,] . . . 1 1
The exact dependence-structure is however typically unknown.
GraphSPME therefore estimates a non-parametric estimate of the precision matrix
using the prec_sparse()
function.
library(GraphSPME)
prec_est <- prec_sparse(x, Z)
prec_est
# 5 x 5 sparse Matrix of class "dgCMatrix"
#
# [1,] 0.94 -0.85 . . .
# [2,] -0.83 1.73 -0.78 . .
# [3,] . -0.86 1.58 -0.73 .
# [4,] . . -0.70 1.54 -0.78
# [5,] . . . -0.76 1.02
Note that GraphSPME allows working in very high dimensions:
frobenius_norm <- function(M) sum(M^2)
set.seed(123)
p <- 10000
x <- t(replicate(n, rar1(p,psi)))
Z <- bandSparse(p, p, (-1):1,
list(rep(1,p-1),
rep(1,p),
rep(1,p-1)))
dim(Z)
# [1] 10000 10000
start.time <- Sys.time()
prec_est <- prec_sparse(x, Z)
end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken
# Time difference of 0.9 secs
# We can compare with the population precision
prec_pop <- bandSparse(p, p, (-1):1,
list(rep(-psi,p-1),
c(1, rep(1+psi^2,p-2), 1),
rep(-psi,p-1)))
frobenius_norm(prec_pop-prec_est)
# [1] 533
GraphSPME is built on the linear algebra library Eigen. In particular the sparse matrix class in Eigen is extensively utilized to obtain efficient and scalable result. For the R-package, RcppEigen is employed for both bindings to R and access to Eigen (no manual installation needed). Bindings to Python are done via PyBind, and here Eigen must be installed manually beforehand.
Clone the repository
git clone git@github.com:equinor/GraphSPME.git
To build the R-package, open the R-package.Rproj
as a project in RStudio. Under Build
hit install and restart
.
To build it manually, or for CRAN, use the Makefile
by running
make Rbuild # to bundle, or
make Rinstall # to bundle and install, or
make Rcheck # to bundle and check if package ready for CRAN
To build the Python-package, first make sure that Eigen is installed and available in the include path. See the Eigen getting started documentation for details. Then GraphSPME is installable in editable mode by
pip install -e ./python-package
GraphSPME combines the inversion scheme in Le (2021) with the regularized covariance estimate of Touloumis (2015) as described in . The package leverages Eigen to obtain scalable result by numerically taking advantage of the graphical nature of the problem.
- Le (2021) introduces a sparse precision matrix estimate from the ml-estimate covariance matrix. The method utilizes the knowledge of a graphical structure beneath the realized data.
- Touloumis (2015) finds asymptotic closed form results for schrinkage of the frequentist covariance estimate.