Using warm starts.

Question

Using warm starts.

privefl opened this issue 7 years ago · 3 comments

I'm much interested by the parameter v of the function irlba.
I am making some algorithm that require the computation of a partial SVD in many iterations, but each time on a matrix with small differences between the one is the previous iteration.
So I expect the SVD to be very similar between iterations, and that is why I would like to use warm starts to improve performance.

I tried:

A <- matrix(0, 5e3, 5e3); A[] <- rnorm(length(A))
print(system.time(
  S <- irlba(A, 10)  
)) # 7 sec
print(system.time(
  S2 <- irlba(A, 10, v = S$v)  
)) # 6 sec

I'm a bit surprised that it is not much faster because I gave it the previous answer.
Am I missing something?

Answer 1 · 2017-10-02T01:34:20.000Z

Sorry for the super-long latency. Crazy summer.

A few notes:

Just S$v above only uses the first singular vector as a starting vector for the Lanczos process. This can indeed help a bit. But it doesn't get the whole subspace associated with the 1st 10 singular vectors.

But! The algorithm can restart using the entire subspace from a previous run. But instead of specifying a single vector like above, specify the whole output of a previous run. Here is an example:

library(irlba)
set.seed(1) 
A <- matrix(0, 5e3, 5e3); A[ ] <- rnorm(length(A))

set.seed(1)
system.time(L1 <- irlba(A, 50, work=100))   # baseline time, note extra work required for this problem
#  user  system elapsed
# 46.160   0.048  11.745

set.seed(1)
system.time(L <- irlba(A, 30, work=100))
#   user  system elapsed 
# 40.040   0.012  10.144 

set.seed(1)
system.time(L2 <- irlba(A, 50, v=L, work=100))   # with help from starting subspace
#   user  system elapsed 
# 34.736   0.024   8.835

I admit that with this particular problem the result is less than impressive (although we do see a small savings).

About this problem. The largest singular values that we're looking for are relatively clustered. For example:

 summary(L1$d)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#  135.7   136.7   137.9   138.0   139.3   141.5

Because there is not much space between the singular values, irlba has a hard time converging on the subspace. That's why I needed to increase the work parameter above. (Try it without this, irlba will report an informative warning.)

So one reason that the method is slowish on this problem is the unfortunate distribution of singular values. I think that restarting will work much better for problems with better singular value distributions.

I hope this helps! Again, sorry for taking all summer.

p.s. there is an issue about the singular value distribution problem, see #19

Also note that I include an alternative algorithm by Martinsson and colleagues in the package called svdr which can perform better in the clustered singular value case. See ?svdr for details.

Answer 2 · 2017-10-02T06:26:08.000Z

Thanks for your answer.

I think I had already tried what you mention.
Yet, I don't want to increase the number of singular values/vectors it returns.
I want to get each time K = 10 PCs, but each time from a matrix where I changed ~10% of the value.
So that, first PCs are not exactly the same, but are not far from one round to another round.

Answer 3 · 2017-10-02T14:23:08.000Z

OK, sorry I misunderstood. I see your question now.

The initial vector can help, but the implementation of irlba uses matrix vector products. A block implementation would probably benefit more because we could supply a whole initial subspace as a guess.

The package has an alternative algorithm, svdr by Martinsson etc., that is a block method. It's not Lanczos, just a basic subspace iteration, but can work better than irlba in certain situations. And I think that method will benefit more from a good starting guess from a previous run.

See the code below (you'll need to grab the latest version from GitHub--I added a restart option to the svdr method recently for this issue).

As you discovered, irlba doesn't benefit all that much from restarting. The example below takes 278 matrix vector products for the 'y' matrix with a random initial vector, and only drops to 250 products with a warm start vector from the nearby 'x' matrix solution. A very modest savings.

The svdr method benefits a lot more. It takes 101 matrix-matrix products with a random initial subspace, but drops to only 41 products with a warm-start subspace from the nearby 'x' matrix solution.

The svdr implementation is not nearly as polished as the irlba one unfortunately. It really needs the following things to be implemented to make it practical to use (in order of importance):

A basic on-line convergence criterion (based on the subspace or the singular values or both, similar to what irlba uses) other than just number of iterations
A good C or Fortran implementation to speed up the loop as much as possible similar to irlba

problem 1. is really the big issue. We have to guess at the number of iterations required to get a decent solution. amazingly, this is also how the Python implementation of this algorithm works, see: http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html (making me wonder how valuable solutions using that code really are)...

I'll think about adding a reasonable convergence criterion and let you know what I come up with. Of course as always, feel free to beat me to this and send a PR!

Best,

Bryan

library(irlba)

N = 1000
set.seed(1)
x = matrix(rnorm(N^2), N)
y = x
y[sample(length(y), N)] = runif(N)

cat("Reference full SVDs\n")
print(system.time(svdx <- svd(x)))
print(system.time(svdy <- svd(y)))

cat("\nIRLBA 10\n")
set.seed(1)
cat("time:   \t", system.time(lx <- irlba(x, 10))[3], "\n")      # total time
cat("m-v prod: \t", lx$mprod, "\n")                            # matrix-vector products
cat("sv err: \t", sqrt(crossprod(lx$d - svdx$d[1:10])), "\n")  # a measure of error

cat("\nIRLBA 10 on perturbed matrix\n")
set.seed(1)
cat("time:   \t", system.time(ly <- irlba(y, 10))[3], "\n")      # total time
cat("m-v prod: \t", ly$mprod, "\n")                            # matrix-vector products
cat("sv err: \t", sqrt(crossprod(ly$d - svdy$d[1:10])), "\n")  # a measure of error

cat("\nIRLBA 10 on perturbed matrix with starting vector\n")
cat("time:   \t", system.time(lyr <- irlba(y, 10, v=lx$v[, 1]))[3], "\n")
cat("m-v prod: \t", lyr$mprod, "\n")
cat("sv err: \t", sqrt(crossprod(lyr$d - svdy$d[1:10])), "\n")

cat("\nsvdr 10\n")
set.seed(1)
cat("time:   \t", system.time(sx <- svdr(x, 10, it=50, extra=30, return.Q=TRUE))[3], "\n")
cat("m-m prod: \t", sx$mprod, "\n")                   # matrix-matrix products
cat("sv err: \t", sqrt(crossprod(sx$d - svdx$d[1:10])), "\n")

cat("\nsvdr 10 on perturbed matrix\n")
set.seed(1)
cat("time:   \t", system.time(sy <- svdr(y, 10, it=50, extra=30))[3], "\n")
cat("m-m prod: \t", sy$mprod, "\n")
cat("sv err: \t", sqrt(crossprod(sy$d - svdy$d[1:10])), "\n")

# note manually reduced iterations it= below
cat("\nsvdr 10 on perturbed matrix with starting subspace\n")
set.seed(1)
cat("time:   \t", system.time(syr <- svdr(y, 10, it=20, extra=30, Q=sx$Q))[3], "\n")
cat("m-m prod: \t", syr$mprod, "\n")
cat("sv err: \t", sqrt(crossprod(syr$d - svdy$d[1:10])), "\n")

Example output looks like:

Reference full SVDs
   user  system elapsed
  6.048   0.108   3.423
   user  system elapsed
  5.764   0.096   3.187

IRLBA 10
time:            0.337
m-v prod:        264
sv err:          2.93946e-08

IRLBA 10 on perturbed matrix
time:            0.359
m-v prod:        278
sv err:          1.510895e-08

IRLBA 10 on perturbed matrix with starting vector
time:            0.321
m-v prod:        250
sv err:          1.257001e-08

svdr 10
time:            2.779
m-m prod:        101
sv err:          9.454343e-06

svdr 10 on perturbed matrix
time:            2.783
m-m prod:        101
sv err:          9.676461e-06

svdr 10 on perturbed matrix with starting subspace
time:            1.152
m-m prod:        41
sv err:          9.166942e-06