smarco/BiWFA-paper

Memory issues with BiWFA for processing millions of paired alignments

Closed this issue · 1 comments

Hi BiWFA team,

Thank you for making this fast and efficient library available for everyone.
I am trying to implement the library in my project and has been successful for small data with 2-3 million read but when I am using this for large data with >5 millions of reads it through out segmentation fault. I have 128GB ram on the system. After testing different 'attributes.memory_mode' setting I found out that the BiWFA runs out of memory after processing certain number of runs and through out segmentation fault error.

Project I am working on involves doing pairwise comparison of n millions of DNA queries ( illumina reads) to m different reference sequences (amplicons). I am calling function

std::string nw_function(std::string refseq, std::string query){
char *pattern;
char text;
pattern = &refseq[0];
text = &query[0];
// Configure alignment attributes
wavefront_aligner_attr_t attributes = wavefront_aligner_attr_default;
attributes.distance_metric = gap_affine;
attributes.alignment_form.span = alignment_end2end;// alignment_end2end;
attributes.affine_penalties.match = 0;
attributes.affine_penalties.mismatch = 4;
attributes.affine_penalties.gap_opening = 20;
attributes.affine_penalties.gap_extension = 2;
attributes.memory_mode = wavefront_memory_ultralow;
// Initialize Wavefront Aligner
wavefront_aligner_t
const wf_aligner = wavefront_aligner_new(&attributes);
// Align
wavefront_bialign(wf_aligner,pattern,refseq.length(),text,refseq.length());
std::string mycig = get_cigar_string(wf_aligner->cigar,true);
// Free
wavefront_aligner_delete(wf_aligner);
return mycig;
}

I tried using your WFA library and encountered similar issues with much lower read processing capacity. For this reason I moved to your BiWFA library which significantly improved the read capacity but not enough to solve the problem. I was hoping if you could help identify solution to the problem I am facing. Can you give some idea about what parameters I would need to modify so BiWFA does not run out of memory.
Greatly appreciate your help.

Hi @GSbioinfo, I suggest opening an issue on the WFA2-lib repository, as it is the best and most up-to-date implementation we have on all WFA-related algorithms. The implementation of BiWFA-paper is quite old and different from the newer one.