bwlewis/irlba

leading eigenvectors are very dependent on seed when there are many small eigenvals

wamiks opened this issue · 5 comments

Hi,

Using your code from your RSPectra comparison with eigenvalues like these:

 [1] 1.2010e+05 3.3975e+04 1.5586e+04 8.3248e+03 7.5574e+03 7.0686e+03 6.2953e+03 6.1647e+03 5.5930e+03 5.3994e+03 4.9431e+03         
 [12] 4.6991e+03 4.3388e+03 4.1773e+03 3.9298e+03 3.8293e+03 3.7712e+03 3.7303e+03 3.6607e+03 3.5395e+03 3.4908e+03 3.3796e+03         
 [23] 3.2971e+03 3.2209e+03 3.1451e+03 3.0029e+03 2.8943e+03 2.8009e+03 2.7219e+03 2.6431e+03 9.7071e-10 6.6343e-10 2.8837e-10         
 [34] 1.8375e-10 1.0578e-10 1.0274e-10 7.8556e-11 7.1549e-11 5.6877e-11 4.8820e-11 4.0724e-11 3.5984e-11 3.4707e-11 2.8522e-11         
 [45] 2.5319e-11 2.5076e-11 2.3933e-11 2.3539e-11 2.1689e-11 2.1335e-11 2.0271e-11 1.9699e-11 1.9426e-11 1.8862e-11 1.8385e-11         
 [56] 1.8283e-11 1.7684e-11 1.7657e-11 1.6216e-11 1.5041e-11 1.3888e-11 1.3781e-11 1.2863e-11 1.2862e-11 1.2635e-11 1.2570e-11         
 [67] 1.2540e-11 1.1974e-11 1.1850e-11 1.1368e-11 1.1355e-11 1.1192e-11 1.0941e-11 1.0780e-11 1.0678e-11 1.0511e-11 9.8443e-12         
 [78] 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12         
 [89] 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12         
[100] 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12         
[111] 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12         
[122] 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12         
[133] 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12         
[144] 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12         
[155] 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12         
[166] 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12         
[177] 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12         
[188] 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12 9.8073e-12         
[199] 9.8073e-12 9.8073e-12 9.8073e-12 

for example gives very different leading eigenvectors depending on the random seed. svd and RSpectra seem mostly stable using this.

best,
michael

Sorry, can you give a code example and I will investigate? Thanks.

Sorry I should be more specific, I don't follow which example you're referring to. There are several examples there and none of them have exactly 200 singular values (some many more, some far less). Can you point me to a specific example? Sorry to be so confused.Thanks...

Sorry, it's from example 1. Use the above set for d and the run the rest twice with 2 different seeds. The 1st eigenvalues from the 2 runs will have low correlation. I know this is probably an edge case, but it might point out some underflow or something in the algo.

Hmmmm. I wrote this test:

ibrary(irlba)                                                                                                                           
f = function(seed)
{
  d = c(1.2010e+05,3.3975e+04,1.5586e+04,8.3248e+03,7.5574e+03,7.0686e+03,6.2953e+03,6.1647e+03,5.5930e+03,5.3994e+03,4.9431e+03,4.6991e+03,4.3388e+03,4.1773e+03,3.9298e+03,3.8293e+03,3.7712e+03,3.7303e+03,3.6607e+03,3.5395e+03,3.4908e+03,3.3796e+03,3.2971e+03,3.2209e+03,3.1451e+03,3.0029e+03,2.8943e+03,2.8009e+03,2.7219e+03,2.6431e+03,9.7071e-10,6.6343e-10,2.8837e-10,1.8375e-10,1.0578e-10,1.0274e-10,7.8556e-11,7.1549e-11,5.6877e-11,4.8820e-11,4.0724e-11,3.5984e-11,3.4707e-11,2.8522e-11,2.5319e-11,2.5076e-11,2.3933e-11,2.3539e-11,2.1689e-11,2.1335e-11,2.0271e-11,1.9699e-11,1.9426e-11,1.8862e-11,1.8385e-11,1.8283e-11,1.7684e-11,1.7657e-11,1.6216e-11,1.5041e-11,1.3888e-11,1.3781e-11,1.2863e-11,1.2862e-11,1.2635e-11,1.2570e-11,1.2540e-11,1.1974e-11,1.1850e-11,1.1368e-11,1.1355e-11,1.1192e-11,1.0941e-11,1.0780e-11,1.0678e-11,1.0511e-11,9.8443e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12,9.8073e-12)
  set.seed(seed)
  N = 3
  tol = 1e-15
  m = length(d)
  u = qr.Q(qr(matrix(rnorm(m ^ 2), m)))
  v = qr.Q(qr(matrix(rnorm(m ^ 2), m)))
  x = u %*% diag(d) %*% t(v)
  s = svd(x)
  l = irlba(x, N, tol=tol)
  abs(crossprod(a$s$u[,1], a$l$u[,1])) - 1
}

and on my laptop I get for 100 seed values from 1...100:

ans = unlist(Map(f, seq(100)))
summary(ans)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max.
## -8.882e-16 -8.882e-16 -8.882e-16 -8.882e-16 -8.882e-16 -8.882e-16

Note that singular vectors are unique only up to sign, so perhaps the difference you're seeing is a sign difference? The comparison above takes the sign difference into account.

I do know of other pathologies with irlba, but I'm not sure that this example fits into those cases, unless maybe I missed something...