Illumina/DRAGMAP

Dragmap failed ERROR: This thread caught an exception first

Opened this issue · 27 comments

Hi,

I want to test dragmap (currently I'm using Bwa mem2) but I get an error.
First precision : I use dragmap with a Conda env, the last version.
Command used:

dragen-os \
-r ${REF_Genome} \
-1 ${Fastq_DIR}/${read1} \
-2 ${Fastq_DIR}/${read2} \
--RGID HG001 \
--RGSM HG001 \
--num-threads ${CPU_number} \
| samtools view \
-b \
-h \
-L ${BED} \
-@ 2 \
> ${Align_DIR}/${ID}.trimmed.align.filtered.bam 2> ${Align_DIR}/logs/${ID}.trimmed.align.filtered.log

It failed after less than 2 minutes. At first it seems to work normally, in multithreading. But then it only uses one thread and it ends up failing.
I get a start of bam with aligned reads.

I am using 11 threads and I have 90G of memory.

The log file:

2021-10-20 18:51:23 [2ba35fd534c0] Version: 1.2.1
2021-10-20 18:51:23 [2ba35fd534c0] argc: 13 argv: dragen-os -r /shared/projects/gentaumix/dragen/reference -1 /shared/projects/gentaumix/HG001/02_Trimming/fastq_drag/HG001.trimmed.R1.fastq.gz -2 /shared/projects/gentaumix/HG001/02_Trimming/fastq_drag/HG001.trimmed.R2.fastq.gz --RGID HG001 --RGSM HG001 --num-threads 11
decompHashTableCtxInit...
0.824 seconds
decompHashTableHeader...
0.002 seconds
decompHashTableLiterals...
1.926 seconds
decompHashTableExtIndex...
0.041 seconds
decompHashTableAutoHits...
44.869 seconds
decompHashTableSetFlags...
6.205 seconds
finished decompress
Running dual fastq workflow on 11 threads. System supports 56 threads.
0 249 0 0 0 0 10000 1 40000 1 1000 0 0 0 6
0 250 0 0 0 0 10000 1 40000 1 1000 0 0 0 5
0 251 0 0 0 0 10000 1 40000 1 1000 0 0 0 4
0 252 0 0 0 0 10000 1 40000 1 1000 0 0 0 3
0 253 0 0 0 0 10000 1 40000 1 1000 0 0 0 2
0 254 0 0 0 0 10000 1 40000 1 1000 0 0 0 1
0 0 271 361 490 392.361 158.769 1 1147 1 789 89456 90372 0 0
Initial paired-end statistics detected for read group all, based on 89456 high quality pairs for FR orientation
Quartiles (25 50 75) = 271 361 490
Mean = 392.361
Standard deviation = 158.769
Rescue radius = 396.924
Effective rescue sigmas = 2.5
Boundaries for mean and standard deviation: low = 1, high = 928
Boundaries for proper pairs: low = 1, high = 1147
NOTE: DRAGEN's insert estimates include corrections for clipping (so they are not identical to TLEN)
[47982249010944] ERROR: This thread caught an exception first

Other precision: I get exactly the same error if I send the results of dragmap in a sam file instead of Samtools view.
other precision : I also tried the version 1.2.0 with the same error

how to solve it?

rizkg commented

Hi,
If that is some public data, could you share the input fastq so that I can replicate the error ?

Hi, If that is some public data, could you share the input fastq so that I can replicate the error ?

It's fastq from this accession : SRR14724533
I used fastp with the default options + poly g tail trimming.
Then i used the fastq output of fastp as an input for dragmap.

rizkg commented

Thanks for the info ! I'll get back to you when I have news.

Thanks for the info ! I'll get back to you when I have news.

I just tested dragmap on the fastq of this accession directly (without using fastp) and it works. On the other hand it's quite slow (16h for a 30x human genome, with equivalent resources and on this sample bwa mem 2 takes 7.2h) but I suppose that this is the kind of thing that will improve with the next versions .

rizkg commented

Hi, we were able to replicate the issue and found the cause, a fix will be there soon.

Hi folks.
I'm seeing the same error using some private in-house whole genome data:

dragen-os -r hg38_no_alt_dragmap_ref -b B46157_4_lanes_dupsFlagged.bam
2022-01-10 14:27:53 	[7f15d6033740]	Version: 1.2.1
2022-01-10 14:27:53 	[7f15d6033740]	argc: 5 argv: dragen-os -r hg38_no_alt_dragmap_ref -b B46157_4_lanes_dupsFlagged.bam
decompHashTableCtxInit...
  1.184 seconds
decompHashTableHeader...
  0.002 seconds
decompHashTableLiterals...
  3.299 seconds
decompHashTableExtIndex...
  0.094 seconds
decompHashTableAutoHits...
  24.441 seconds
decompHashTableSetFlags...
  2.636 seconds
finished decompress
Running fastq workflow on 144 threads. System supports 144 threads.
0	249	0	0	0	0	10000	1	40000	1	1000	0	0	0	6	
0	250	0	0	0	0	10000	1	40000	1	1000	0	0	0	5	
0	251	0	0	0	0	10000	1	40000	1	1000	0	0	0	4	
0	252	0	0	0	0	10000	1	40000	1	1000	0	0	0	3	
0	253	0	0	0	0	10000	1	40000	1	1000	0	0	0	2	
0	254	0	0	0	0	10000	1	40000	1	1000	0	0	0	1	
[139729232258816]	ERROR: This thread caught an exception first

I see that error after about 3 hours of runtime and the processes seem to hang and never return. I installed this version through conda. is there a recommended workaround?

rizkg commented

Hi Richard,
We were able to find and fix this bug, which arises for the mapping of some very short reads.
We will publish the fix on this repo very soon,
Best,
Guillaume

rizkg commented

Hi,
A fix for this issue has been pushed to the master branch.
Could you try again with latest version from master on your data and check it fixed the bug you had ?
Guillaume

Hi there.
I installed from master but got the same error again:

for b in $(ls *bam); do echo "/gsc/software/linux-x86_64-centos7/dragmap-1.2.1-5/bin/dragen-os -r hg38_no_alt_dragmap_ref -b ${b}  > ${b}_dragmap.sam"; done  | bash -x
+ /gsc/software/linux-x86_64-centos7/dragmap-1.2.1-5/bin/dragen-os -r hg38_no_alt_dragmap_ref -b B46157_4_lanes_dupsFlagged.bam
2022-01-20 12:33:03 	[7f177c99f7c0]	Version: 1.2.1-5-gf36d7849
2022-01-20 12:33:03 	[7f177c99f7c0]	argc: 5 argv: /gsc/software/linux-x86_64-centos7/dragmap-1.2.1-5/bin/dragen-os -r hg38_no_alt_dragmap_ref -b B46157_4_lanes_dupsFlagged.bam
decompHashTableCtxInit...
  1.741 seconds
decompHashTableHeader...
  0.002 seconds
decompHashTableLiterals...
  3.795 seconds
decompHashTableExtIndex...
  0.077 seconds
decompHashTableAutoHits...
  28.186 seconds
decompHashTableSetFlags...
  3.060 seconds
finished decompress
Running fastq workflow on 144 threads. System supports 144 threads.
0	249	0	0	0	0	10000	1	40000	1	1000	0	0	0	6	
0	250	0	0	0	0	10000	1	40000	1	1000	0	0	0	5	
0	251	0	0	0	0	10000	1	40000	1	1000	0	0	0	4	
0	252	0	0	0	0	10000	1	40000	1	1000	0	0	0	3	
0	253	0	0	0	0	10000	1	40000	1	1000	0	0	0	2	
0	254	0	0	0	0	10000	1	40000	1	1000	0	0	0	1	
[139737098520320]	ERROR: This thread caught an exception first
rizkg commented

Hi, thanks for checking.
I am working on it.

rizkg commented

Hi, a new fix was pushed to master branch. Could you check again on your data ?
Thanks,
Guillaume

Looks like I still get an error:

dragen-os -r hg38_no_alt_dragmap_ref -b B46157_4_lanes_dupsFlagged.bam
2022-02-03 10:28:41 	[7f320a6287c0]	Version: 1.2.1-7-gc87d93aa
2022-02-03 10:28:41 	[7f320a6287c0]	argc: 5 argv: /gsc/software/linux-x86_64-centos7/dragmap-1.2.1-7/bin/dragen-os -r hg38_no_alt_dragmap_ref -b B46157_4_lanes_dupsFlagged.bam
decompHashTableCtxInit...
  1.505 seconds
decompHashTableHeader...
  0.002 seconds
decompHashTableLiterals...
  3.205 seconds
decompHashTableExtIndex...
  0.070 seconds
decompHashTableAutoHits...
  23.794 seconds
decompHashTableSetFlags...
  1.850 seconds
finished decompress
Running fastq workflow on 144 threads. System supports 144 threads.
0	249	0	0	0	0	10000	1	40000	1	1000	0	0	0	6	
0	250	0	0	0	0	10000	1	40000	1	1000	0	0	0	5	
0	251	0	0	0	0	10000	1	40000	1	1000	0	0	0	4	
0	252	0	0	0	0	10000	1	40000	1	1000	0	0	0	3	
0	253	0	0	0	0	10000	1	40000	1	1000	0	0	0	2	
0	254	0	0	0	0	10000	1	40000	1	1000	0	0	0	1	
[139851179972352]	ERROR: This thread caught an exception first

I have permission to share the data with you if it helps

rizkg commented

Hi Richard,
Yes that would be very helpful !
How big is it ?

To share the bam and reference i am using it would be about 52Gb.

Hi @rizkg ,
Have you had any luck reproducing my error? I am getting some pressures at my center to have this up and running, so please let me know if there is anything else I can provide.

Also, do you think it may help if I try running your binary directly (or in a container?)

rizkg commented

Hello,
Yes I have been able to reproduce the error.
It does not seem to come from your hashtable or from your binary. The problem seems to be in the bam parsing code.
As a temporary workaround, you could first convert your bam to fastq, e.g.
samtools bam2fq B46157_4_lanes_dupsFlagged.bam | gzip > file.fastq.gz
And then run dragmap with this fastq file, e.g.
dragen-os -r hg38_no_alt_dragmap_ref -1 file.fastq.gz --output-directory ./ --output-file-prefix B46157
I'll keep you posted as soon as we have a fix for this.

Thanks. Trying it out now.

rizkg commented

Hello again,
Forget what I said before, that would give you single-end mapping.
The issue is because we do not support bam input sorted by coordinate, it should be sorted by read names.
So you should do, e.g.

samtools sort --threads 16 -n B46157_4_lanes_dupsFlagged.bam > B46157_4_lanes_dupsFlagged_name_sorted.bam

And then use the name sorted bam as dragmap input, and specify --interleaved true in the dragmap options to have paired mapping.
We'll add a proper check and error message for this problem.

I'm having the same problem. Only that I'm inputting paired fastq files instead of bam file.
The command looks like this:
dragen-os -r /paedyl01/disk1/yangyxt/indexed_genome/hg19 -1 /paedyl01/disk1/yangyxt/wgs/9_samples_20201202/trimmed_sequences/A 160792B_1_val_1.fq.gz -2 /paedyl01/disk1/yangyxt/wgs/9_samples_20201202/trimmed_sequences/A160792B_2_val_2.fq.gz --num-threads 23 --Aligner.sec-aligns 5 --fastq-offset 30 --Aligner. sw-method dragen --verbose --RGID A160792B --RGSM A160792B --output-directory /paedyl01/disk1/yangyxt/wgs/9_samples_20201202/aligned_results --output-file-prefix A160792B

And here is the error log:
2022-04-22 17:37:05 [2b2fe2e5ee00] Version: 1.2.1 2022-04-22 17:37:05 [2b2fe2e5ee00] argc: 24 argv: dragen-os -r /paedyl01/disk1/yangyxt/indexed_genome/hg19 -1 /paedyl01/disk1/yangyxt/wgs/9_samples_20201202/trimmed_sequences/A 160792B_1_val_1.fq.gz -2 /paedyl01/disk1/yangyxt/wgs/9_samples_20201202/trimmed_sequences/A160792B_2_val_2.fq.gz --num-threads 23 --Aligner.sec-aligns 5 --fastq-offset 30 --Aligner. sw-method dragen --verbose --RGID A160792B --RGSM A160792B --output-directory /paedyl01/disk1/yangyxt/wgs/9_samples_20201202/aligned_results --output-file-prefix A160792B.bqsr decompHashTableCtxInit... 1.133 seconds decompHashTableHeader... 0.001 seconds decompHashTableLiterals... 1.627 seconds decompHashTableExtIndex... 0.044 seconds decompHashTableAutoHits... 19.191 seconds decompHashTableSetFlags... 1.453 seconds finished decompress INFO: writing SAM file to "/paedyl01/disk1/yangyxt/wgs/9_samples_20201202/aligned_results/A160792B.bqsr.sam" INFO: writing mapping metrics stats into "/paedyl01/disk1/yangyxt/wgs/9_samples_20201202/aligned_results/A160792B.bqsr.mapping_metrics.csv" INFO: writing insert stats into "/paedyl01/disk1/yangyxt/wgs/9_samples_20201202/aligned_results/A160792B.bqsr.insert-stats.tab" Running dual fastq workflow on 23 threads. System supports 80 threads. Initial paired-end statistics detected for read group all, based on 88335 high quality pairs for FR orientation Quartiles (25 50 75) = 233 300 373 Mean = 304.777 Standard deviation = 106.367 Rescue radius = 265.917 Effective rescue sigmas = 2.5 Boundaries for mean and standard deviation: low = 1, high = 653 Boundaries for proper pairs: low = 1, high = 793 NOTE: DRAGEN's insert estimates include corrections for clipping (so they are not identical to TLEN) [47523547105024] ERROR: This thread caught an exception first /paedyl01/disk1/yangyxt/ngs_scripts/common_bash_utils.sh: line 3651: 297460 Segmentation fault (core dumped) dragen-os -r ${ref_genome_dir} -1 ${forward_reads} -2 ${reverse_rea ds} --num-threads ${threads} --Aligner.sec-aligns 5 --fastq-offset 30 --Aligner.sw-method dragen --verbose --RGID ${samp_ID} --RGSM ${samp_ID} --output-directory $(dirname ${output_ align}) --output-file-prefix $(basename ${output_align/.bam/})

Sorry I dunno why the text wrap is disabled... I'll paste the key lines from the error log down below:
Initial paired-end statistics detected for read group all, based on 88335 high quality pairs for FR orientation
Quartiles (25 50 75) = 233 300 373
Mean = 304.777
Standard deviation = 106.367
Rescue radius = 265.917
Effective rescue sigmas = 2.5
Boundaries for mean and standard deviation: low = 1, high = 653
Boundaries for proper pairs: low = 1, high = 793
NOTE: DRAGEN's insert estimates include corrections for clipping (so they are not identical to TLEN)
[47523547105024] ERROR: This thread caught an exception first
/paedyl01/disk1/yangyxt/ngs_scripts/common_bash_utils.sh: line 3651: 297460 Segmentation fault (core dumped)

rizkg commented

Hi,
Thanks for your report.
Although this is same error message as previous error reports in this thread, I am not sure this has a common cause. We are working on reporting more meaningful error messages.
Meanwhile, would you be able to share your input files ?

Thank you for the response!
I'm not sure I can. Even if I want to, the FASTQ files are huge since they are WGS samples.

Hi I am facing the same issue. I am aligning my short reads to SARS-Cov-2 reference genome.

dragen-os --num-threads 10 -r results/04_alignDRAGMAP/index/dragmapidx -1 data/S9_1.fastq.gz -2 data/S9_2.fastq.gz > temp.sam

2022-07-28 19:30:32 	[14afcfe29740]	Version: 1.3.0
2022-07-28 19:30:32 	[14afcfe29740]	argc: 9 argv: dragen-os --num-threads 10 -r results/04_alignDRAGMAP/index/dragmapidx -1 data/S9_1.fastq.gz -2 data/S9_2.fastq.gz
decompHashTableCtxInit...
  0.000 seconds
decompHashTableHeader...
  0.002 seconds
decompHashTableLiterals...
  0.004 seconds
decompHashTableExtIndex...
  0.000 seconds
decompHashTableAutoHits...
  0.010 seconds
decompHashTableSetFlags...
  0.004 seconds
finished decompress
Running dual fastq workflow on 10 threads. System supports 112 threads.
0	249	0	0	0	0	10000	1	40000	1	1000	0	0	0	6	
0	250	0	0	0	0	10000	1	40000	1	1000	0	0	0	5	
0	251	0	0	0	0	10000	1	40000	1	1000	0	0	0	4	
0	252	0	0	0	0	10000	1	40000	1	1000	0	0	0	3	
0	253	0	0	0	0	10000	1	40000	1	1000	0	0	0	2	
0	254	0	0	0	0	10000	1	40000	1	1000	0	0	0	1	
Segmentation fault (core dumped)

The index was created using

samtools faidx dragmapidx/$fasta 
gatk CreateSequenceDictionary -R dragmapidx/$fasta	
dragen-os --build-hash-table true --ht-reference dragmapidx/$fasta  --output-directory dragmapidx --ht-num-threads 20
gatk ComposeSTRTableFile -R dragmapidx/$fasta -O dragmapidx/str_table.tsv

and the directory look like

hash_table.cfg
hash_table.cfg.bin
hash_table.cmp
hash_table_stats.txt
reference.bin
ref_index.bin
repeat_mask.bin
sequence.dict
sequence.fasta
sequence.fasta.fai
str_table.bin
str_table.tsv

Is this bug solved in latest version?

The latest release was on May 5th 2022. And I reported the bug in July 2022. So yes this isn't solved yet.