Verifying memory consumption measurements
karel-brinda opened this issue · 8 comments
Our benchmarks report relatively low memory requirements even situations when I would expect more.
When I ran recently the plasmid DB experiment, with max_ram_gb: 25
, I got the following memory results:
https://github.com/karel-brinda/mof-experiments/blob/7703f62bbb4c04383ed430024617c35fe9ece941/experiments/A60_mof_search_experiments_C/c23_ebiplasmids_2022_12_01__memstream_withcobs_filter_autothr/benchmarks/match_2022_12_01T16_30_11.txt
i.e., mem 11415960 kb = 11.4 GB.
When I looked at the memory consumption of individual COBS instances, and compared it to the highest number among them, I got exactly the same number:
https://github.com/karel-brinda/mof-experiments/blob/7703f62bbb4c04383ed430024617c35fe9ece941/experiments/A60_mof_search_experiments_C/c23_ebiplasmids_2022_12_01__memstream_withcobs_filter_autothr/benchmarks/run_cobs/pseudomonas_aeruginosa__01____all_ebi_plasmids___reads_1___reads_2___reads_3___reads_4.txt
11415960 kb.
Is it even theoretically possible that the Snakemake management wouldn’t increase the memory consumption?
@leoisl Do you have any possible explanation of this?
When rerunning the same experiment with mmam, here’s a screenshot of htop:
The RES column suggests that the total mem consumption of COBS is
$ p3 7.8+4.7+1.8+4.6+3.0+1.6+3.5
27.0
i.e. 27 GB, which roughly corresponds to the requested limit of 25 GB (I guess at some point something uses GiB instead of GB so it doesn’t match perfectly)
I’ve got the following results:
https://github.com/karel-brinda/mof-experiments/blob/c786527649d676304927e97d4d330485efd65a60/experiments/A60_mof_search_experiments_C/c24_ebiplasmids_2022_12_02__mmap_withcobs_filter_autothr/benchmarks/match_2022_12_02T13_01_19.txt
I.e., mem reported to be 11.2 GB, which seems to be really different from what htop was reporting during the computation (27 GB, see below) and from the expected mem requirements in the conf (max mem specified to be 25 GB, see below).
And again, the max mem is equal to the max mem value reported for P. aerugionosa:
And this batch definitely wasn’t the last one (i.e., multiple other COBS instances must have been running at the same time), see the corresponding Snakemake log:
https://github.com/karel-brinda/mof-experiments/blob/c786527649d676304927e97d4d330485efd65a60/experiments/A60_mof_search_experiments_C/c24_ebiplasmids_2022_12_02__mmap_withcobs_filter_autothr/snakemake_logs/2022-12-02T080120.213893.snakemake.log.xz
Also, it's theoretically possible that the measurements are indeed correct. OS X might be doing some background optimization like not representing exactly zero memory, a smart use of swapping, etc. to minimize the memory footprint. If it's the case, it does it very well!!
I think this might actually be an issue, for some reason /usr/bin/time
does not sum up RAM usage from subprocesses...
I have a new way to measure RAM, running on the plasmid query to see how different the RAM is measured with time
and this new method...
This seems to be one another possible approach to measuring mem: https://gist.github.com/netj/526585
This seems to be one another possible approach to measuring mem: https://gist.github.com/netj/526585
That uses /bin/ps
, I am afraid we would get similar results to our current approach... would you be able to test?
For the paper, it was eventually measured by SLURM.