ekg/gimbricate

Gimbricate segfaults on a particular GFA

RolandFaure opened this issue · 9 comments

We are running gimbricate on a graph produced by an assembler, with the line gimbricate -g assemblyGraph_k63.gfa -n -p h.paf -f h.fasta >h.gimbry.gfa. It segfaults after a few seconds.
When looking, it seems that it produced paf and fasta file correctly, and segfaulted after writing many sequences in h.gimbry.gfa but before writing the links.
It does not seem to come from a RAM limitation error, as the segfault comes very quickly. Other graphs outputted by the same assembler ran fine with gimbricate, thus we suppose the error might come from a particular topological feature of the gfa. Do you know of any such limitation of gimbricate ?
Thanks in advance :)

ekg commented

It is indeed a gfa with very long overlaps. The thing is, I am not actually using the master branch of gimbricate but the fork we made (ege-eeb fork, see issue #2 ), in which the Smith-Waterman algorithm is taken out in align.cpp, precisely to deal with large, perfect alignments (it segfaults in the same way in the ekg version of gimbricate). I checked,

gimbricate segfaulted before ever going into align.cpp.

Does that mean that gimbricate tries to allocate memory for SW before going into align.cpp ?
Or is that memory necessary for something else entirely ?

ekg commented

Using gdb, the error occurs :
#3 0x0000555555581898 in std::function<void (gfak::edge_elem const&)>::operator()(gfak::edge_elem const&) const (__args#0=..., this=0x7fffffffc5e0)
at /usr/include/c++/7/bits/std_function.h:706
#4 gfak::GFAKluge::for_each_edge_line_in_file(char*, std::function<void (gfak::edge_elem const&)>) (this=0x7fffffffc370, filename=,
func=...) at gimbricate/deps/gfakluge/src/gfakluge.hpp:936
#5 0x0000555555568d17 in main (argc=, argv=)
at gimbricate/src/main.cpp:132

I've checked the GFA and found no overlaps longer than an edge or edges referring to non-existing nodes. Moreover all other GFA produced by the same assembler are processed without problem.

ekg commented
ekg commented
ekg commented

@RolandFaure the problem here was that your GFA file didn't have a header. Thus, the first record was skipped, but the link records associated with it were read. This caused the segfault.

(echo 'H\tVN:Z:1.0' ; cat assemblyGraph_k63.gfa) >assemblyGraph_k63.fixed.gfa
gimbricate -g assemblyGraph_k63.fixed.gfa -p assemblyGraph_k63.paf -f assemblyGraph_k63.fa -t 16 >assemblyGraph_k63.gimbry.gfa

This runs without issue (at least on the edlib-overlaps branch).

I've pushed a fix that will check that the header is present and the file is apparently GFAv1.