statOmics/tradeSeq

Running time for fitGAM

Sa753 opened this issue · 3 comments

Sa753 commented

Dear team,

I have a problem with running fitGAM as it keeps running for hours without any output then after 5h of running it came up with this message

sce <- fitGAM(sce, conditions = factor(sce$treatment), nknots = 5, BPPARAM = BPPARAM)

Fitting lineages with multiple conditions. This method has been tested on a couple of datasets, but is still in an experimental phase.
|+ | 1 % ~08d 09h 38m 34sWarning message:
In asMethod(object) :
sparse->dense coercion: allocating vector of size 1.6 GiB

obviously, I stopped the run after 6h of no progress. I changed the sce object to counts using

counts <- as.matrix(sce@assays@data$counts)

but it still keeps running for hours with no output.

sce is a single cell experiment object produced from integrated Seurat object using as.SinglecellExperiment() function.

I also tried
sce <- fitGAM(counts = counts, sds= sce@colData$slingshot, conditions = factor(sce$treatment),
nknots = 5)

but the same issue, it runs forever with no output
I am using Seurat version 4 and R version 4.2 and all the packages are uptodate.
Could you please advise?

Thanks

Hi there,
I am also a slingshot+condiments+tradeSeq user. I think I have some thoughts to this problem.
1, Note that fitGAM function sets 'parallel = F' by defualt. So simply setting BPPARAM is not enough.
2, please also check the dimensions & lineages of your data. Avoid unnecessary computational burden. I would only use highly variable genes (instead of ALL GENES) on one or two lineages & conditions.
Below is my code. It takes about 40 minutes to complete a 20k cell dataset using 10 cores.
`set.seed(3)
genes = VariableFeatures(sce.fate) #sce.fate is the seurat.object from which tgfb are derived
conditions = factor(tgfb$condition)
BPPARAM <- BiocParallel::bpparam()
BPPARAM$workers <- 10

tgfb <- fitGAM(counts = tgfb, nknots = 5,
conditions = conditions,
parallel = T,
BPPARAM = BPPARAM,
genes = genes)`

Thank you @derekrusso for already providing a great answer.

@Sa753 , to know whether what you're experiencing is abnormal, it would be good to hear about the specifics of your dataset (number of conditions, lineages, cells, genes). Thanks!

Sa753 commented

Hi @koenvandenberge,

I run the dataset on a cluster so the option of parallelisation did not work unfortunately. Even on the cluster, it needed around 3 days to run so I had to stop the run.

Now I have 2 conditions and I run around 20K cells they are immune cells. I tried running on Highly variable genes only but it still needs > 2 days to run.

Thanks