muellan/metacache

Segmentation fault when using --lowest species

donovan-h-parks opened this issue · 3 comments

Hi,

I've run into an issue where MetaCache runs as expected using the following parameters, but crashes with "Command terminated by signal 11" when the --lowest species flag is added:

-pairfiles -no-map -taxids -lineage -separate-cols -threads 32 -abundances profile.tsv -abundance-per species -out classification.log"

Is there a set of incompatible flags I'm using or is it possible that using the -lowest flag has uncovered a bug?

Thanks,
Donovan

Interestingly, everything works if I use -lowest subspecies which makes me think there is a sequence that somehow has an invalid species name. I'm using the recommended RefSeq DB with the NCBI taxonomy as per the MetaCache instructions. I've noticed that NCBI does sometime have genomes with invalid Taxon ID (i.e. the NCBI taxonomy has been updated, but the associated genome data has not been updated yet). Perhaps a similar issue is happening here.

Hi Donovan! I'm not sure where bad taxonomy data could cause a segfault. Invalid taxon ids should be ignored by MetaCache. Does this error happen only with abundance output?
Can you please check if the per-read output works (dropping -no-map) with default output / -taxids-only?

Can I send you the data that is causing the bug? It is ~100 GB, but I can upload it to a FTP site if you can make one available.