bwlewis/irlba

Possible performance issue in irlba with sparse matrices

eromero-vlc opened this issue · 3 comments

I think that irlba takes too much time with sparse matrices. Check out a comparison with RSpectra:

require(irlba)
require(RSpectra)
require(Matrix)

A <- as(sparseMatrix(i=1:5000,j=1:5000,x=1:5000), "dgCMatrix");
set.seed(1)
system.time(r<-irlba(A,40,tol=1e-5,verbose=TRUE))
#> Working dimension size 47
#> Initializing starting vector v with samples from standard normal distribution.
#> Use `set.seed` first for reproducibility.
#> irlba: using fast C implementation
#>    user  system elapsed
#>   7.696   0.000   7.680
r$mprod
#> [1] 3796
set.seed(1)
system.time(r<-irlba(A,40,tol=1e-5,work=80,verbose=TRUE))
#> Working dimension size 80
#> Initializing starting vector v with samples from standard normal distribution.
#> Use `set.seed` first for reproducibility.
#> irlba: using fast C implementation
#>    user  system elapsed
#>   1.904   0.000   1.905
r$mprod
#> [1] 1424
system.time(r<-RSpectra::svds(A,40,tol=1e-5))
#>    user  system elapsed
#>   0.192   0.000   0.193
r$nops
#> [1] 1141

When increasing the maximum basis size, the time reduces but it's still ten times slower than RSpectra. I think it can be an issue with the matvec implementation in C or the restarting.

Thanks. First please note RSPectra has some other issues, see https://bwlewis.github.io/irlba/comparison.html.

Oddly I get nearly the opposite result you do on my system:

require(irlba)
## Loading required package: irlba
## Loading required package: Matrix
require(RSpectra)
## Loading required package: RSpectra
require(Matrix)
set.seed(1)
A <- as(sparseMatrix(i=1:5000,j=1:5000,x=1:5000), "dgCMatrix")
set.seed(1)
system.time(r<-irlba(A,40,tol=1e-5))
##   user  system elapsed 
##  8.304   2.364   2.677 
set.seed(1)
system.time(r<-irlba(A,40,tol=1e-5,work=80))
##   user  system elapsed 
##  2.428   0.504   0.735 
system.time(r<-RSpectra::svds(A,40,tol=1e-5))
##   user  system elapsed 
## 15.068   0.000  15.090 

This was tested with:

R.version
               _                                       
platform       x86_64-pc-linux-gnu                     
arch           x86_64                                  
os             linux-gnu                               
system         x86_64, linux-gnu                       
status         beta                                    
major          3                                       
minor          4.0                                     
year           2017                                    
month          04                                      
day            08                                      
svn rev        72499                                   
language       R                                       
version.string R version 3.4.0 beta (2017-04-08 r72499)


packageDescription("irlba")
Package: irlba
Type: Package
Title: Fast Truncated SVD, PCA and Symmetric Eigendecomposition for
        Large Dense and Sparse Matrices
Version: 2.2.0

Package: RSpectra
Type: Package
Title: Solvers for Large Scale Eigenvalue and SVD Problems
Version: 0.12-0

on my quad-core home AMD Athlon A10-7850K PC with 16 GB RAM.

I can think of two things that might account for this:

  1. I used the version of irlba from GitHub, maybe a bit faster than the CRAN version.
  2. irlba uses the same BLAS/LAPACK libs that R does. On my system I use libopenblas (http://www.openblas.net/). If you install R using a reference default BLAS this can be very slow (for lots of other R functions too).

Can you check your BLAS/LAPACK library that R is using?

Any other ideas?

FYI here is an old note I wrote on how I like to configure BLAS for R:

http://illposed.net/r-on-linux.html

p.s. sorry about the long latency.

I was using the R from the SUSE distribution. It seems that R is configured with an unoptimized BLAS. After recompiling R with OpenBLAS the performance is quite similar to what you report. Thanks!