prcomp_irlba
erichson opened this issue · 5 comments
You give the following example for the prcomp_irlba, however, the results are not consistent:
> set.seed(1)
> x <- matrix(rnorm(200), nrow=20)
> p1 <- prcomp_irlba(x, n=3)
> summary(p1)
Importance of components%s:
PC1 PC2 PC3
Standard deviation 1.541 1.2513 1.1916
Proportion of Variance 0.443 0.2921 0.2649
Cumulative Proportion 0.443 0.7351 1.0000
> p2 <- prcomp(x, tol=0.7)
> summary(p2)
Importance of first k=3 (out of 10) components:
PC1 PC2 PC3
Standard deviation 1.5411 1.2513 1.1916
Proportion of Variance 0.2806 0.1850 0.1678
Cumulative Proportion 0.2806 0.4656 0.6334
Thanks for noting this! I will add a comment in the documentation for prcomp_irlba explitly about this.
It's not an error -- the differences are due to the fact that prcomp_irlba computes something different than prcomp -- it computes a truncated principal components decomposition. Thus the cumulative summaries are not known and cannot be computed.
The current documentation has some notes on the differences, but doesn't mention your example so I will add that. Thanks for pointing it out, the added documentation will help others.
Best,
Bryan
OK, see ?prcomp_irlba and example(prcomp_irlba). Let me know if you think that the exampe can be better. Feel free to send a PR!
Bryan, that is certainly a fix, but I would suggest the approach I take for computing the randomized principal component analysis: https://github.com/Benli11/rSVD/blob/master/R/rpca.R, line 203.
a) you compute the total variation of the input data as sigma = sum( apply( Re(A) , 2, stats::var )
, where A denotes the input matrix.
b) Then you can compute the explained variance ratio as sdev**2 / sigma
, where sdev denotes the square root of the approximated dominant k eigenvalues.
Best,
Ben
Hi Ben,
Thanks! That didn't occur to me.
I didn't realize someone else had implemented the Martinsson, etc. method in R. I guess we have two implementations for now. I added references to your package in the help and vingette.
I'm wondering if I should just remove that algorithm from the irlba package in the medium term, and simply point to your package in the future? Would you consider adding a convergence implementation similar to the one in rsvd? Perhaps I can send a pull request sometime... What would you prefer?
I have found that block methods are superior for pathological problems with clustered singular values, see the examples in irlba. That's why I ended up putting the Martinsson method in there in the first place. Ultimately I'd like to have a solid block irlba implementation in there, but I've had trouble getting something that works well in general (the nice original paper by Baglama and Reichel, http://www.math.uri.edu/~jbaglama/papers/paper17.pdf, has some implementation issues that I have never resolved that result in poor performance sometimes.)
Again, thanks for helping to improve irlba with your suggestion. Further collaboration would be great! *
Best,
Bryan
- @benli11 perhaps you might be interested in co-maintaining irlba? just a thought...
Bryan, thanks for adding the references. That is really great! Whether you depreciate the rsvd implementation is up to you, of course. Anyway, I would be happy to add variants of the rsvd if you send a pull request. I think that can make the rsvd package stronger.
I would be very happy in helping you co-maintaining the irlba package, if I can make helpful contributions. I send you you an email regarding ideas for a future collaboration. Also, I am looking forward to discuss some of the pathological problems, and see if we can come up with some better implementations.
Best,
Ben