gagneurlab/OUTRIDER

findEncodingDims Out-of-Memory Issues

Opened this issue · 5 comments

Dear Gagneur Lab,

I'm currently trying to determine the optimal q-value for my OutRider analysis. For which I'm using the following code:

# calculate optimal q
encDimParams <- c(2,5,8,12,14,16,18,20)
q <- findEncodingDim(ods_Filtered, params = encDimParams, BPPARAM=MulticoreParam(1))

pdf(Q_Plot_File)
plotEncDimSearch(q)
dev.off()

However, each time I submit this code on our HPC cluster, the node appears to go out-of-memory right at the end of the optimisation. I've tried with 50, 150 and even 715 GB of Ram. Do you by any chance have an idea what could be going wrong? I've also looked into the solutions mentioned under #11.

Kind regards,
Laurenz De Cock

Hi Laurenz, sorry for the late reply, can you tell us the number of genes and samples in your ods_filtered object?

Dear @vyepez88,

We have a total of 38 samples in our ods_Filtered object and around 350 000 "genes". The genes are actually intronic regions as we are currently collaborating with a lab of the Netherlands that used OUTRider to detect aberrant usage of introns and exons as well.

Kind regards,
Laurenz De Cock

I am currently not sure how our algorithm scales with samples and genes.

@vyepez88 how much memory do you need for a similar amount of counts?
38 x 350 000 would be similar to an RNA-seq dataset with 900 samples and 15000 genes.

What is the largest we fitted an how much memory did it require?

@lemdcock

You could try running:

findEncodingDim(ods_Filtered, params = encDimParams, BPPARAM=MulticoreParam(1), implementation='pca')

with the implementation='pca' option. That should give you at least some results.
This will test which number of principal components is optimal for detecting outliers.

Afterwards you could try to fit OUTRIDER for the optimal dimension.

Let us know if that worked!
Best,
Felix

Dear Laurenz,
I'm checking the memory requirements by testing some of our cohorts and come back to you after it's finished.
On the meanwhile, OUTRIDER was designed to test gene counts in sufficiently expressed genes. I'm not sure about its performance in intronic counts, with way lower values than genes and that may contain lots of zeros across samples. Also, we recommend at least 50 samples.

Dear Laurenz,
I just finished running a complete instance of OUTRIDER including optimizing q in a count matrix of 17248 genes x 741 samples. I provided 30 cores. It took a bit more than 10h to complete and it utilized 40.65 GB of Memory.
Best,
Vicente