muellan/metacache

Segmentation fault

alex-dem opened this issue · 7 comments

Hello,

if I use the query option with a fastq file containing more than 15 Ts in a row I get a segmentation fault. If I replace the 16 Ts with 16 As (or more) there is no segmentation fault.

This fastq works properly:
@QHOT2:10403:11886
CTGACAGCATGTTGACTTTTGCTTCCTTTGTCAAGCACAGAAAAACAGGAAGCACAAAATCATTTTTTTTTTTTTTT
+

But if I add an extra T at the end, I get a segmentation fault.
@QHOT2:10403:11886
CTGACAGCATGTTGACTTTTGCTTCCTTTGTCAAGCACAGAAAAACAGGAAGCACAAAATCATTTTTTTTTTTTTTTT
+

thank you

We did a quick check and could not reproduce your problem, so it seems that I need more information about your setup:

  • Which version of MetaCache do you use / did you try the latest version?
  • What is your g++ version?
  • Did you use any additional compilation flags/macros (for setting different data type widths)?
  • Did you query against the default refseq database or did you built a custom database?
  • If you didn't use the defaults, which build parameters (-winlen, -sketchlen, -kmerlen, ...) and which query parameters did you use?
  • I downloaded the code from git, last week. Is there an option to see the version from the executable?
  • I am running metacache on a Debian10 with g++ (Debian 8.3.0-6) 8.3.0
  • I used the default settings during compilation
  • I run metacache-build-refseq to build the database

I also run the included tests and everything works fine.

The same fault happens if I trim the read from the beginning:

@QHOT2:10403:11886
AAAAACAGGAAGCACAAAATCATTTTTTTTTTTTTTTT
+

Could you give me the output of the commands:

  • metacache info
  • metacache info <your database name>
    and also the command that you used for querying?

./metacache info

MetaCache version 0.6.1 (20190925)
database version 20190916

sequence type std::__cxx11::basic_string<char, std::char_traits, std::allocator >
target id type unsigned short int 16 bits
target limit 65535

window id type unsigned int 32 bits
window limit 4294967295
window length 128
window stride 113

sketcher type mc::single_function_unique_min_hasher<unsigned int, mc::same_size_hash >
feature type unsigned int 32 bits
feature hash mc::same_size_hash
kmer size 16
kmer limit 16
sketch size 16

bucket size type unsigned char 8 bits
max. locations 254
location limit 254

hit classifier mc::best_distinct_matches_in_contiguous_window_ranges

./metacache info refseq.db
Reading database from file 'refseq.db' ... done.

MetaCache version 0.6.1 (20190925)
database version 20190916

sequence type std::__cxx11::basic_string<char, std::char_traits, std::allocator >
target id type unsigned short int 16 bits
target limit 65535

window id type unsigned int 32 bits
window limit 4294967295
window length 128
window stride 113

sketcher type mc::single_function_unique_min_hasher<unsigned int, mc::same_size_hash >
feature type unsigned int 32 bits
feature hash mc::same_size_hash
kmer size 16
kmer limit 16
sketch size 16

bucket size type unsigned char 8 bits
max. locations 254
location limit 254

hit classifier mc::best_distinct_matches_in_contiguous_window_ranges

I run the command ./metacache query refseq.db dummy.fq
and also, I tried the interactive mode:
./metacache query refseq.db

dummy.fq

both gave me segmentation fault.

Hm, these are all default values.
Does the segfault also happen, if you use just the one sequence from above or only if this sequence is part of a larger fastq file?

Could it be, that the database building process didn't finish properly?
You could try to checkout the latest version from the Github repo again. Compile it again and build the database again.
I know this isn't very helpful, but I really don't have any ideas at the moment, since we are unable to reproduce your bug.

Ok, I'll recompile everything from scratch and rebuild the database.
FYI, the segfault was happening on a larger file and by constant dividing the file in half, I got to one (of many ?) read that is causing a segfault.

Thanks for your time and I'll let you know how it went!

Everything works now as expected. I downloaded the source from git, compiled it and rebuilt the database and I don't get a segmentation fault any more.
Thank you for your help!