Support non-default matrix types?
LTLA opened this issue · 3 comments
I was wondering whether it would be possible to support non-default matrix types in rsvd
? For example, anything from Matrix, or some of our custom matrix classes in Bioconductor packages. Some testing suggests that this would only require minor modifications to the existing code, namely:
- Removal of the
as.matrix(A)
line near the top ofrsvd.R
. - Addition of
importFrom(Matrix,crossprod)
to theNAMESPACE
.
And then stuff like this automatically works without trying to expand the matrix into a dense array:
library(Matrix)
library(rsvd)
out <- rsvd(rsparsematrix(10000, 10000, 0.01), k=10)
In our case, we're dealing with fairly huge matrices (>100 GB in RAM) that are held on file. We have %*%
and crossprod
defined, the only things preventing us from using rsvd()
are the two points above.
I'm happy to put in a PR on this matter if you're open to it.
@LTLA that would be really great to make these modifications, indeed. I am very happy if you like to push changes to the repository. Otherwise, I can do it as well of course.
Also, it would be great to demonstrate the performance for some fairly big matrices in some form. Maybe, we can compile a short blog post together showing some results and sketching the idea of randomized methods for linear algebra, if that is something you are interested in. But, we can talk offline about this.
Best,
Ben
Thanks @erichson. In fact, there are a bunch of us in the Bioconductor community working on improving the scalability of our existing pipelines (in this case, for single-cell RNA sequencing data in the Human Cell Atlas project), so this is definitely an area that we're interested in helping out and being helped. We can probably discuss this elsewhere, but I'll tag @kasperdanielhansen and @mikejiang before I forget.
@LTLA okay great, let's touch base soon. I would like to better understand what your needs are and how I can help. Also, it is greet to e-meet you @kasperdanielhansen and @mikejiang.