MG-RAST/MG-RAST-Tools

Memory management with m5nr-tools.pl

Closed this issue · 6 comments

I was running m5nr-tools.pl to get annotations from protein ids and the script quickly consumed all 8GB of memory on my machine and would not release it after shutdown. I tried increasing the batch size to 1000 which helped a bit but did not solve the problem.

The command I ran was:
perl m5nr-tools.pl --api http://api.metagenomics.anl.gov --option annotation --source RefSeq --acc ../acc.txt > ../all-annotation2.txt

The query file is 386M and contains 28,581,905 protein ids.

Thanks for letting us know. We will check. Is there a way you could provide us with your query file?

Thanks for the file, I tried to reproduce the behavior but did not get the same results. Anyhow memory usage went up to 4GB and slowly went down over the whole runtime. The memory usage should be improved.

This might be an issue of the perl version being used and/or the OS.
On Mar 10, 2014, at 6:37 PM, Andreas Wilke notifications@github.com wrote:

Thanks for the file, I tried to reproduce the behavior but did not get the same results. Anyhow memory usage went up to 4GB and slowly went down over the whole runtime. The memory usage should be improved.


Reply to this email directly or view it on GitHub.

I'm using Perl 5.14.2 on an Ubuntu machine:
This is perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux-gnu-thread-multi
3.11.0-18-generic #32-Ubuntu SMP Tue Feb 18 21:11:14 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Reducing the size of the file solved the problem.

Updated script - has reduced memory footprint. Memory usage still depends on input and output size.