bbuchfink/diamond

Segmentation fault (core dumped) in V2.1.9

miparrama opened this issue · 11 comments

I just installed Diamond V2.1.9 with Conda and I was getting this error after some seconds running:
Segmentation fault (core dumped).
I downgraded to V2.1.8 and it worked perfectly.

What's your command line and what are the input files?

The command I run is part of the CheckV pipeline:

diamond blastp --outfmt 6 --evalue 1e-5 --query-cover 50 --subject-cover 50 -k 10000 --query ./metaviralspades/virsorter2/tmp/proteins.faa --db /home/m.p.martinez/resources/checkv/checkv-db-v1.5/genome_db/checkv_reps.dmnd --threads 1 > ./metaviralspades/virsorter2/tmp/diamond.tsv 2> ./metaviralspades/virsorter2/tmp/diamond.tsv.log

The query fasta has this data:
image

Are you on a Mac or Linux?

Confirming this same issue. Removing --query-cover 50 --subject-cover 50 or downgrading solves the issue. Running this on Debian Linux.

I run it on Ubuntu 20.04 LTS.

I can't reproduce a segfault with these settings. Could you send me the query file?

I have a potentially similar issue wwood/singlem#156 - or similar to #780, which is usually fine once the input files have ~10M sequences.

I tested in Ubuntu 22.04 with AMD 5950x and 128GB memory. I run it with the same results with Linux kernel 5.15.0-14 and now with kernel 6.7.4.

Hi @bbuchfink , thanks for keeping this great software maintained.
I can confirm that v2.1.9 introduced a bug letting Diamond constantly crash with a segmentation fault (error code -11)

I manually tested this, as we recently got a steep influx of filed issues on Bakta regarding Diamond. Our users report that downgrading to v2.1.8 helps.

Command line: diamond blastp --db /home/oliver/tmp/bakta-test-amrf/db-light/pscc.dmnd --query /tmp/tmponmrzb8c/cds.pscc.faa --out diamond.pscc.tsv --id 50 --query-cover 80 --subject-cover 80
Output:

diamond v2.1.9.163 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen
Documentation, support and updates available at http://www.diamondsearch.org
Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

#CPU threads: 8
Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1)
Temporary directory: 
#Target sequences to report alignments for: 25
Opening the database...  [0.086s]
Database: /home/oliver/tmp/bakta-test-amrf/db-light/pscc.dmnd (type: Diamond database, sequences: 3134924, letters: 1096161454)
Block size = 2000000000
Opening the input file...  [0s]
Opening the output file...  [0s]
Loading query sequences...  [0s]
Length sorting queries...  [0s]
Masking queries...  [0s]
Building query seed set...  [0s]
Algorithm: Query-indexed
Building query histograms...  [0s]
Seeking in database...  [0s]
Loading reference sequences...  [1.848s]
Length sorting reference...  [1.988s]
Initializing temporary storage...  [0s]
Building reference histograms...  [2.188s]
Allocating buffers...  [0s]
Processing query block 1, reference block 1/1, shape 1/2.
Building reference seed array...  [1.488s]
Building query seed array...  [0s]
Computing hash join...  [0.049s]
Searching alignments...  [0s]
Deallocating memory...  [0s]
Processing query block 1, reference block 1/1, shape 2/2.
Building reference seed array...  [1.357s]
Building query seed array...  [0s]
Computing hash join...  [0.037s]
Searching alignments...  [0s]
Deallocating memory...  [0s]
Deallocating buffers...  [0.062s]
Clearing query masking...  [0s]
Computing alignments... Loading trace points...  [0.001s]
Sorting trace points...  [0s]
Computing alignments... Speicherzugriffsfehler (Speicherabzug geschrieben)

I used Diamond v2.1.9 on Linux via Conda.

I can reproduce the problem now, will fix.

@bbuchfink , I saw that you pushed the fix to this issue into the main repo already back in February, but didn't tag or publish the 2.1.10 release yet. Any plans to do that in the near future, or do you intend to include some additional changes in 2.1.10?