amplab/snap

Paired read matcher can use enormous memory for large input with many chimeric reads

Closed this issue · 1 comments

The paired read matcher reads in an input SAM/BAM file in order and emits matched pairs of reads. For a sorted input where there are lots of chimerically mapped reads, it may be a long time between mate pairs showing up, and in the interim SNAP stores the first end in memory (not only uncompressed, but in a format that is actually pretty wasteful of buffer space).

This can use an inordinate amount of memory for large input files with a high chimeric read fraction. We will need to find some way to mitigate this, probably by spilling to disk.

Fixed in 1.0.