Question about memory consumption
LQHHHHH opened this issue · 8 comments
Hi,
First thanks for providing such a good tool. I am now using orthofinder to infer single copy genes. But one of the MCL step takes too much memory so I decided to run this step on a big memory node with 1T memory. I would like to know if this 1T memory is enough. Now my matrix file in mci format has dimensions 31965032x31965032, and the file size is about 1.5T. Here is my mcl -z information.
$ mcl -z
[mcl] cell size: 8
[mcl] cell contents: int and float
[mcl] largest index allowed: 2147483647
[mcl] smallest index allowed: 0
Prune number 10000 [-P n]
Selection number 1100 [-S n]
Recovery number 1400 [-R n]
Recovery percentage 90 [-pct n]
warn-pct 10 [-warn-pct k]
warn-factor 1000 [-warn-factor k]
dumpstem [-dump-stem str]
Initial loop length 0 [-l n]
Main loop length 10000 [-L n]
Initial inflation 2.0 [-i f]
Main inflation 2.0 [-I f]
I would be appreciate for your answers to my question.
Thanks in advance!
Hello and thanks.
mcl has the option -how-much-ram
to give a pessimistic estimate for RAM usage, and a -scheme
option to change its pruning of small entries strategy:
> mcl foobar -how-much-ram 31965032
The current settings require at most <682846.75M> RAM for a
graph with <31965032> nodes, assuming the average node degree of
the input graph does not exceed <1400>. This (RAM number)
will usually but not always be too pessimistic an estimate.
> mcl foobar -scheme 5 -how-much-ram 31965032
The current settings require at most <438972.91M> RAM for a
graph with <31965032> nodes, assuming the average node degree of
the input graph does not exceed <900>. This (RAM number)
will usually but not always be too pessimistic an estimate.
> mcl foobar -scheme 4 -how-much-ram 31965032
The current settings require at most <390198.14M> RAM for a
graph with <31965032> nodes, assuming the average node degree of
the input graph does not exceed <800>. This (RAM number)
will usually but not always be too pessimistic an estimate.
The estimates vary from 0.7T RAM for the default setting, to 0.4T RAM for the setting -scheme 4
.
If you haven't done so already, reading of the input graph becomes many times faster if you convert the input from the ascii format to binary format:
mcx convert graph.mci graph.mcx
If graph.mci
is in the human-readable format, then graph.mcx
will be binary format and much quicker to read.
Finally, I always assume(d) that graphs for orthology will be a bit sparser than most biological networks, due to e.g. best reciprocal hit filtering. I may be wrong of course, but I'm interested to know how many edges/entries there are in your input graph/matrix; if you use e.g. the mcx convert
command above it will tell you this number.
Hi,
Thank you for your prompt response! Your answer is really professional and helpful. Just wanted let you know, my edges are 107744198225. I have transformed mci to mcx it takes about 700Gb memory, and run mcl graph.mcx -I 1.5 scheme 7 -o graph.I1.5.txt
now.
I'm wondering if I use -scheme 6/5/4
will my results be much worse than the -scheme 7
? I think I have to trade off time and accuracy.
Thanks,
Qionghou
Thank you - that means that the average number of neighbours per node in your graph is 3370. That is on the one hand a significant pressure on memory, as also indicated by the size of the mcx
file, but on the other hand given that there are 33M nodes in the network, it's not unreasonably large.
I don't think results will be much worse with -scheme 4
, and the process should be much quicker. Your setting -scheme 7
is in fact higher than the default setting 6. In the FAQ (which is way too long and verbose - see https://micans.org/mcl/man/mclfaq.html#qsep) this is a summary of my view:
The more severe pruning is, the more the computed process will tend to converge prematurely. This will generally lead to finer-grained clusterings. In cases where pruning was severe, the mcl clustering will likely be closer to a clustering ideally resulting from another MCL process with higher inflation value, than to the clustering ideally resulting from the same MCL process.
As your inflation value is relatively high, and orthology clusters tend to have small diameters, I expect convergence to be quick and the effect of pruning to be relatively benign. As you've started a -scheme 7
run, you could see if it finishes, and if you can spare the (CPU) time, also try -scheme 4
, save the output under a different name, and use clm dist
to see how different the two results are.
Further notes: (1) I assume you have noticed the -t
option to specify the number of CPUs. On a 1T machine you presumably have many CPUs; I would suggest something like -t 16
. In the past I've found that the gain from more CPUs becomes negligble, presumably because memory contention/access diminishes any gains; however I'm plucking this 16 out of thin air now. I'm curious what your experience is, in fact. (2) A highly parallel implemenation of mcl exists, https://bitbucket.org/azadcse/hipmcl/wiki/Home that can use networked machines and scales much further; I don't know how easy it is to install (instructions are given in the page linked).
If you have further questions just comment and I'll reopen/comment.
Hi,
Thanks for you suggestions. I have taken a very long time to find a suitable server to complete this task. Finally, It takes 1.4 Tb memory and 23 days with 16 threads to finish. Thank you so much for helping me!
Cheers,
Qionghou
Hello, thanks for reporting the result, that is quite something (amazing)! I'm curious, did you change the -scheme setting?
Here is my command:
mcl OrthoFinder_graph.mcx -I 1.5 -te 16 -o clusters_OrthoFinder_I1.5.txt
and My MCL version
$ mcl --version
mcl 22-282
Copyright (c) 1999-2022, Stijn van Dongen. mcl comes with NO WARRANTY
to the extent permitted by law. You may redistribute copies of mcl under
the terms of the GNU General Public License.
I use this command because my next step is to send the results to Orthofinder, and this command is suggested by Orthofinder.
Hi Qionghou,
Thank you for sharing. If you need to run mcl again on very large data in the future it can be useful to remember the -scheme parameter (e.g. try -scheme 4). It will reduce the memory requirements and lead to a shorter run-time. For now it's very cool to know mcl processed a graph with 33M nodes!