Saving memory by splitting the object?
jjia1 opened this issue · 0 comments
Hello! I was working through the slingshot pipeline, and I wanted to perform some DEG analysis over pseudotime using tradeSeq (as it seems to be the next logical conclusion of the analysis). I was interested in using the conditions
argument, but I think it might be a little too memory expensive for my current resources. For some reason, BiocParallel multiplies the amount of memory used for one R environment by the number workers I specify in BPPARAM, e.g. 50g x 8 threads = 400+ gb.
Rather than trying to fix the parallelization issue (which seems to have been mentioned by others), I wanted to ask if using the conditions argument could be effectively analyzed by just splitting my slingshot object by condition. So running tradeSeq on my slingshot object after it's been split into N number of condition/timepoint subsets (to decrease memory utilization). I think it would also help me plot individual lineage and condition smoothers for my dataset with multiple lineages and conditions . For example:
counts <- as.matrix(assays(sce)$counts)
conds <- factor(colData(sce)$treatment_timepoint)
sds <- SlingshotDataSet(sce)
tradeseq_results <- fitGAM(counts = counts,
sds = sds,
nknots = 6,
genes = genes, #set the HVG genes would be like 1:25 (for genes 1 to 25)
conditions = conds, # any number of conditions
parallel = T, BPPARAM = BPPARAM,
verbose = T)
vs. running the following code for each condition separately.
# assume sce1 through 4 are the subsetted object
sce1 <- subset(sce, ...) # i know it's wrong just for example
sce2 <- subset(sce, ..)
tradeseq_results <- fitGAM(counts = as.matrix(assays(sce1)$counts),
sds = SlingshotDataSet(sce1), ....)
Furthermore, is it actually possible to do this for each lineage individually? Or are there any comparisons that fitGAM is performing that would cause some kind of information loss?