muellan/metacache

Segmentation fault (core dumped)

jaimeortiz-david opened this issue · 6 comments

I am getting this error every time I run a query to the database.

Reading database metadata ...
Reading 1 database part(s) ...
Completed database reading. %
custom query sketching settings: -sketchlen 32 -winlen 127 -winstride 112
Classifying query sequences.
Per-Read mappings will be written to file: /local/workdir/metacache/results_DBsimulated/reduced_32/sample_kmer32.txt_sample_10_1m_R1.fq_sample_10_1m_R2.fq.txt
Per-Taxon mappings will be written to file: /local/workdir/metacache/results_DBsimulated/reduced_32/abund_extraction.txt_sample_10_1m_R1.fq_sample_10_1m_R2.fq.txt
[> ] 0%Segmentation fault (core dumped)

Hi! Could you please give more information on when this happens? Is this immediately at the beginning of the query? Is there anything in the output files?

Have you tried different input files for the query? Can you try input files with only a few sequences?

Hi,

Thank you for your response. This issue happens when I am querying my database immediately at the beginning of the query. The output files are empty, so I do not have more information to figure out the specific problem.

I have tried different input files, including simulated reads! How many sequences do you suggest I could try as a minimum?
I will cut down the number of sequences on the input file.

PS. On an additional note, I am curious to know if a reference database could be built using only raw reads from the micro-organisms of interest?

I am not sure what's going wrong. If the error occurs for every input you tried, the database could be corrupt. You could try to rebuild the database or try a different database / different genomes. Make sure you have enough disk space available to save the database. The query should work for any number of sequences (even a single sequence).

You can build a database from sequence reads, but for this you might need to create your own taxonomy mapping files (see here).

Could I use bot pair end files as input for the database command?

When building a database all sequences are processed separately, so it is not possible to use the paired information.

Finally, do I have to create only the assembly_summary.txt or do I also need to build my own accession2taxid?

Either is fine. The assembly_summary.txt at the end of your post should be sufficient. MetaCache's build process will tell you if the sequences could be ranked using the files you provided.

I just wanted to add that I also had this segmentation fault right after the database was "read". I was able to fix my specific issue by using a higher memory node. Without it, it seems that the database is not read properly even though the log file says the database was read.