amplab/snap

controlling memory usage for sort/index/markdup

teerjk opened this issue · 13 comments

I'm trying to control memory usage during the sort/index/markdup step (-so). I have set -sm, but memory usage is generally the same no matter what setting I give. -sm ranges from 0.5-4 as well as not setting -sm give similar usage. Only setting -sm to 20 seemed to provoke a change (ran out of memory on 165GB node). Memory usage also seemed similar with 4, 8, and 16 threads. The memory usage generally approaches all available on the node, as reported by "RES" in top as well as by the cluster software (PBS/Torque). On a 64GB node, usage was ~63G; on a 165GB node usage was ~156GB. Usage is pretty stable during alignment (40-60GB), but then increases to all available once "sorting" is output on stdout. I looked through the code, and didn't see anything immediately obvious, but am not expert enough in C++ to really dig in to the lower level stuff.
Any thoughts or hints?
Thanks!

That is the behavior I was expecting: sort/markdup total use defined by -sm in addition to that used by alignment. I am using version 84516be, which I believe is the latest master. Happy to try a different version. I did try -di, but it didn't make a difference.
I have been testing on one particular file, and so will try some others. I was interested in the observation that everything worked well on a 165GB node and a 64GB node, with memory usage approaching these total values.
One possible issue is that the linux kernel on our CentOS cluster is very old, which may affect the memory allocation behavior.
At the end of the day, I'm trying to predict/control usage in order to be a good citizen on our HPC, and when eventually running on the cloud, to request appropriate resources.
Great work by the way! I haven't thoroughly evaluated the alignment quality, but there's no denying the performance increase.

Sorry, I should have said that I'm testing on a human Whole Exome sample with a final BAM size of ~20GB. I'm seeing about 20% duplication overall, which isn't terribly high for these samples. I've run the same sample with a bwa-picard pipeline, and picard MarkDuplicates used 7.8GB (I requested 8GB).

While I cannot share the initial example, I've identified a public data file pair that behaves similarly:
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR359/SRR359295/SRR359295_1.fastq.gz
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR359/SRR359295/SRR359295_2.fastq.gz

These files are of similar size and duplicate rate, and I see the same increasing memory usage.
I did watch the memory usage closely, and note that much of the usage once it gets to the sorting step is actually cached. However, top and htop still report that the snap-aligner processes include that memory as "RES": ~ 148GB on a 165GB node using 16 threads and -sm 2. My concern is that other processes cannot access this memory on a multi-user system. Of course, I can solve the problem by grabbing entire nodes, but that brings its own issues.
Thanks.

Thanks for sharing the test data. I will look at the issue and try to reproduce what you are seeing.

--Arun

I've been watching some runs closely and have noticed that during the sort, "RES" memory as reported by top climbs to almost maximum physical on the node. However, if I look at usage using "free", most of the used memory is actually cached. the "+/- buffers/cache:" line then shows only ~20-30GB used, with the rest being free (cached). If I start a different big memory job (samtools sort), snap's memory usage (RES) goes down as samtools increases. Interestingly, once snap is finished, I can observe samtools memory usage alone using free: it is all "used", and very little "cached". So, I think snap's usage is just behaving differently than I'm used to seeing.

One more separate but related issue. I'm testing the serial mode where you give multiple samples on the command line and snap processes them one at a time. I am testing on a set of 192 smaller pairs (10M reads), and once it gets to pair 60 or so, the alignment slows WAY down. It starts at around 20-30 sec per pair, but then the last few pairs increase: 50-100 sec, 300, 2,460, 7,447, and then 15,330 sec. It's been working on the last file for more than 12 hours with little progress. Interestingly, memory usage is again high, but this time 'free' reports almost all the memory is "used" - there is little cached this time around. Snap's CPU usage is much lower (60-80%, when usually on a 16 core machine it uses 1600%), and there are a number of system io process also using CPU, suggesting system IO. Perhaps a memory issue with multiple serial samples?

I've been able to build and test 1.0dev.104, and can confirm that it seems to solve the memory control issue when running a single sample. Memory usage now behaves like I'm used to seeing on Linux.
I'm testing multiple samples now and will report back.
Thank you for your efforts on this!

I can also confirm that memory issues running in serial sample mode is also fixed - all 192 of my lower read count test samples aligned in less than 1 minute. Thanks again!

Fixed in 1.0.4.