PhilBoileau/scPCA

Increase speed and decrease memory usage of (s)cPCA on large datasets.

PhilBoileau opened this issue · 5 comments

We need to increase computation speed while decreasing memory usage when running cPCA and scPCA. The current implementation is particularly slow on large datasets, like those encountered when analyzing scRNA-seq data. We should consider implementing more efficient numerical methods from packages like BiocSingular, DelayedArray, and DelayedMatrixStats.

This is motivated by Bioconductor/OrchestratingSingleCellAnalysis#35.

This completely slipped my mind, sorry about that.

Unfortunately, I don't think that I'll be able to take advantage of DelayedArray's suite of functions and related packages to increase scPCA's computational efficiency. scPCA computes and operates over covariance matrices under the hood, and relies on eigenvalue decomposition. Based on my brief review of the BiocSingular, DelayedArray, and DelayedMatrixStats documentation, there doesn't seem to a function for computing eigenvalue decompositions. I could rely on DelayedMatrix's matrix multiplication to compute the sample covariance matrices, though. Any thoughts or suggestions?

Otherwise, I can run DelayedMatrix objects through as.matrix when required in one of the helper functions. I don't think that's ideal, but it's a quick fix.

LTLA commented

Yes, it might be sufficient to use the DelayedMatrix machinery to compute the sample covariance matrices, and then you can go about your merry way with the remaining decompositions. I'm assuming that, once you get the covariance matrices (as ordinary R matrices), you don't need the originals anymore?

Sounds good! That's right, the DelayedMatrix covariance matrices can won't be used again once we have their ordinary R matrix versions.

@LTLA Through PR #52, scPCA now implements DelayedArray support! I'll leave this issue open for the next little while in case any problems arise from this PR.