Increase speed and decrease memory usage of (s)cPCA on large datasets.
PhilBoileau opened this issue · 5 comments
We need to increase computation speed while decreasing memory usage when running cPCA
and scPCA
. The current implementation is particularly slow on large datasets, like those encountered when analyzing scRNA-seq data. We should consider implementing more efficient numerical methods from packages like BiocSingular
, DelayedArray
, and DelayedMatrixStats
.
This is motivated by Bioconductor/OrchestratingSingleCellAnalysis#35.
Definitely bumping up against the lack of DelayedArray
support now:
This completely slipped my mind, sorry about that.
Unfortunately, I don't think that I'll be able to take advantage of DelayedArray
's suite of functions and related packages to increase scPCA
's computational efficiency. scPCA
computes and operates over covariance matrices under the hood, and relies on eigenvalue decomposition. Based on my brief review of the BiocSingular
, DelayedArray
, and DelayedMatrixStats
documentation, there doesn't seem to a function for computing eigenvalue decompositions. I could rely on DelayedMatrix
's matrix multiplication to compute the sample covariance matrices, though. Any thoughts or suggestions?
Otherwise, I can run DelayedMatrix
objects through as.matrix
when required in one of the helper functions. I don't think that's ideal, but it's a quick fix.
Yes, it might be sufficient to use the DelayedMatrix
machinery to compute the sample covariance matrices, and then you can go about your merry way with the remaining decompositions. I'm assuming that, once you get the covariance matrices (as ordinary R matrices), you don't need the originals anymore?
Sounds good! That's right, the DelayedMatrix
covariance matrices can won't be used again once we have their ordinary R matrix versions.