eldariont/svim-asm

using more than 64G RAM

cschin opened this issue · 3 comments

Hi, @eldariont, thanks for providing this tool. I tried it out today, With one haploid assembly result (human), I ran out 64G memory. Is larger memory usage expected? Do you have an estimate about how much memory usage is needed for calling SV from one human haplotype assembled contig?

Hi Jason,
thanks for reporting this issue. No, such a high memory consumption is not expected at all. In my experiments I never ran out of memory but now that you and another user reported this I looked deeper into it.

It turns out that most of the memory is consumed when SVIM-asm reconstructs alignment segments from the SA tag of primary alignments. This is great for SV calling because I can treat reconstructed supplementary alignments the same way as primary alignments that were read directly from the BAM file. However, each alignment segment stored the original contig sequence again which could blow up for long contigs with lots of alignment segments.

Luckily, I found that it's not necessary at all to store the contig sequence in the alignment segment because pysam infers it from the CIGAR string if not provided. Without storing the contig sequence the memory consumption dropped substantially in my experiments while the output remained the same.

In a minute, I will push a commit that contains the fix. It would be great if you could quickly confirm whether that fixes the problem for you, too. If that's the case I will draft a new release and update bioconda accordingly.

Cheers
David

@eldariont Thanks, I run it through today. No problem about memory usage any more. Very cool.

Thanks for checking and sorry for the inconvenience you had with this issue.
I drafted a new release (v1.0.1) which is available via pypi already and will be on bioconda very soon.