irlba and svd doen't macth

Question

irlba and svd doen't macth

isglobal-brge opened this issue 7 years ago · 4 comments

I have a squared matrix M that is obtained after performing some calculation. I have to compute its SVD. I've tried both svd and irlbafunctions and the results are not the same. I would be very grateful if you could indicate me how to get SVD by using irlba since your method is extremely useful in my problem. The matrix is attached to this message and this is the code I use:

> library(irlba)
[M.txt](https://github.com/bwlewis/irlba/files/806438/M.txt)

> t1 <- svd(M))
> t2 <- irlba(M, 5)
> t3 <- irlba(M, 400)
> t1$u[1:5,1:5]
              [,1]         [,2]         [,3]        [,4]         [,5]
[1,]  1.337658e-05 -0.022042268  0.072997682 -0.06740229  0.024306528
[2,]  4.150039e-03 -0.088295027 -0.014797329 -0.06674204 -0.040441062
[3,] -1.461567e-03 -0.005261221 -0.009778176  0.09441301 -0.059103520
[4,] -3.989281e-02  0.061646129 -0.021303995  0.02768563 -0.058945328
[5,] -3.433767e-02  0.021539362 -0.016010397 -0.02408006  0.002703832
> t2$u[1:5,1:5]
            [,1]         [,2]          [,3]          [,4]        [,5]
[1,] 0.004987207 -0.004488571 -0.0676958252  0.0511607108 -0.05923026
[2,] 0.001833914 -0.089510673  0.0004203942 -0.0269936629  0.05055553
[3,] 0.001189413 -0.012061085 -0.0166318788  0.0971423657  0.10112741
[4,] 0.037901336  0.070035622  0.1303613492  0.0001745698  0.02288906
[5,] 0.027499900  0.009384006 -0.0056690380  0.0468421469  0.01239040
> t3$u[1:5,1:5]
              [,1]         [,2]         [,3]        [,4]          [,5]
[1,]  0.0005569089  0.022238880 -0.072787347  0.06127649  0.0008511529
[2,]  0.0084894343  0.087430938  0.014782482  0.06570138 -0.0463571028
[3,] -0.0026525785  0.005337798  0.009813996 -0.09609238 -0.0642166988
[4,] -0.0427231000 -0.059921706  0.022765963 -0.02962612 -0.0642055456
[5,] -0.0344191182 -0.020140115  0.015668233  0.02770949  0.0123238452

Answer 1 · 2017-05-12T04:28:43.000Z

A few things:

Try the current development version, devtools::install_github("bwlewis/irlba"). Indeed the version now on CRAN does very poorly on your problem!

Interestingly, Giuseppe Rodriguez pointed nearly the same thing out (see issue #19). A partial improvement is in place, pending a more comprehensive solution later.

Now, specifically with respect to your matrix M, notice that:

x = as.matrix(read.table("https://github.com/bwlewis/irlba/files/806438/M.txt", header=TRUE))
s = svd(x)
table(round(s$d, 5))

0.5   1 
263 142

That is, to within 5 digits of accuracy, your matrix has two singular values: 263 of them are about 0.5 and the remaining 142 are about all equal to one.

Algorithms like irlba are designed to find the first few of the largest (or smallest) singular values. But the first 142 largest singular values in this case are all about the same (all 1). In this kind of situation, it's hard for irlba to pick out exactly the correct subspace (really any subspce in the span of the 1st 142 singular vectors is about the same!!).

However, even so, the current version of irlba does more or less OK on your problem. Consider:

x = as.matrix(read.table("https://github.com/bwlewis/irlba/files/806438/M.txt", header=TRUE))
s = svd(x)

library(irlba)
L = irlba(x, 5)

round(crossprod(L$u, s$u[,1:5]), 3)
       [,1]   [,2]   [,3]   [,4]   [,5]
[1,]  1.000 -0.011 -0.016 -0.013  0.001
[2,] -0.011 -1.000 -0.002 -0.003 -0.001
[3,] -0.015  0.001 -0.991  0.073  0.028
[4,] -0.001  0.003 -0.027 -0.141 -0.099
[5,] -0.003  0.001 -0.039 -0.054 -0.034

That shows that, the 1st three vectors computed by irlba are almost the same orthonormal basis as the vectors computed by svd (up to sign). The other two are not, but they turn out to be orthonormal to other vectors associated with singular values near one anyway.

Note that this can be improved by setting the tol smaller and work higher in irlba, for example:

L = irlba(x, 5, tol=1e-14, work=30)
round(crossprod(L$u, s$u[,1:5]), 3)
       [,1]   [,2]   [,3]   [,4]   [,5]
[1,] -0.999  0.031  0.014  0.008  0.006
[2,] -0.031 -0.999 -0.003 -0.008  0.006
[3,] -0.014  0.003 -1.000 -0.002  0.003
[4,] -0.008  0.008  0.002 -0.997 -0.039
[5,] -0.006 -0.005 -0.004  0.002 -0.860

(now all of them are pretty close).

Looking at the orthogonality of the vectors perhaps more insightful than comparing individual entries in this case, as far as what the svd is trying to compute anyway.

Does that help?

This is an excellent question, thanks. Thanks to real-world examples like this the package has been improved a lot. I expect a new version on CRAN in a week or so.

Answer 2 · 2017-05-12T04:42:09.000Z

Let me just add for clarity, that since the largest 142 singular values are all approximately the same (to within 5.565548e-13 !!!), then the choice of largest singular vectors is somewhat arbitrary in your problem.

It's not clear that one can distinguish, say, one 5-dimensional subspace associated with the largest singular values from another easily. Effectively any subspace in the span of the 1st 142 singular vectors is as valid as any other.

It's more interesting in your problem to find the split between the large and small singular values, and then decompose the whole space into the the 142-dim space of large values and the complementary 263-dim space of small ones.

Answer 3 · 2017-05-23T22:50:15.000Z

Is it OK to close this? Does the above help?

Answer 4 · 2017-05-24T06:22:24.000Z

I really appreciate your help. I had no time to check your answer since I'm very busy with other stuff --- you can close this if you want. It seems that your answer will work 2017-05-24 0:50 GMT+02:00 B. W. Lewis <notifications@github.com>:

…

Is it OK to close this? Does the above help? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AVi4wcizPo56djUyLUKDN7zKPEP8lUe0ks5r82KpgaJpZM4MOOth> .

-- This message is intended exclusively for its addressee and may contain information that is CONFIDENTIAL and protected by professional privilege. If you are not the intended recipient you are hereby notified that any dissemination, copy or disclosure of this communication is strictly prohibited by law. If this message has been received in error, please immediately notify us via e-mail and delete it. DATA PROTECTION. We inform you that your personal data, including your e-mail address and data included in your email correspondence, are included in the ISGlobal Foundation files. Your personal data will be used for the purpose of contacting you and sending information on the activities of the above foundations. You can exercise your rights of access, rectification, cancellation and opposition by contacting the following address: lopd@isglobal.org. ISGlobal Privacy Policy at www.isglobal.org.

----------------------------------------------------------------------------------------------------------------------------- CONFIDENCIALIDAD. Este mensaje y sus anexos se dirigen exclusivamente a su destinatario y puede contener información confidencial, por lo que la utilización, divulgación y/o copia sin autorización está prohibida por la legislación vigente. Si ha recibido este mensaje por error, le rogamos lo comunique inmediatamente por esta misma vía y proceda a su destrucción. PROTECCIÓN DE DATOS. Sus datos de carácter personal utilizados en este envío, incluida su dirección de e-mail, forman parte de ficheros de titularidad de la Fundación ISGlobal para cualquier finalidades de contacto, relación institucional y/o envío de información sobre sus actividades. Los datos que usted nos pueda facilitar contestando este correo quedarán incorporados en los correspondientes ficheros, autorizando el uso de su dirección de e-mail para las finalidades citadas. Puede ejercer los derechos de acceso, rectificación, cancelación y oposición dirigiéndose a lopd@isglobal.org . Política de privacidad en www.isglobal.org.