Excessive memory usage on large dataset

Question

Excessive memory usage on large dataset

Closed this issue 5 years ago · 5 comments

On a large dataset (made from 30 mouse samples, of different tissues, 100M RNAseq reads per sample) Lace consistently stalls without error. I traced this to excessive memory usage (>200GB of RAM), which exceeds our capacity to run the program on the whole dataset.

The denovo assembly was conducted in Trinity and the clustering was done using the necklace protocol. https://github.com/Oshlack/necklace.

Possibly related to issue #29 and/or #31.

Answer 1 · 2019-02-05T00:01:33.000Z

Hi,

I released a new version of Lace yesterday (1.10) which should hopefully fix this issue. Please test and let us know if you still run into excessive memory issues.

Cheers,
Nadia.

Answer 2 · 2019-02-07T02:58:00.000Z

Yes this ran much faster and drastically reduced the memory consumption (down to 10GB). Thanks for this update!

However I did notice 107 repeats of the following message printed to standard error. Lace still finished and I continued to do a hisat alignment, finding that there were 107 empty reference sequences, so I'm guessing this error in Lace left empty sequences in the final fasta file. Is this expected with the new changes to the algorithm?

Answer 3 · 2019-02-07T03:15:26.000Z

Hi Sarah, this looks like a bug. The first exception is okay and if handled correctly should print an error message, but continue on to produce a supertanscript which is just the longest transcript. The sequences should never be empty, so I think the second exception is the real problem. I've commented out the troublesome line of code in the dev version on github, so feel free to try it again (or just comment out the line with "traceback.print_exception()" in Lace/BuildSuperTranscript.py manually if you prefer.

Answer 4 · 2019-02-08T01:17:10.000Z

That seems to have fixed the bug for me 👍

Answer 5 · 2019-02-08T03:57:42.000Z

Great! I'll close this and the networkx version issue.