lh3/miniasm

Segmentation fault happened at step3

Closed this issue · 9 comments

Hi Heng,

I have a problem while running the miniasm to assemble a plant genome. My genome is nearly 400MB. It was sequenced by PacBio with depth of 85x. Before running miniasm, I used the minimap to do the overlap. Do you have some suggestions to solve this problem?

Thanks,
Wen-Biao

[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::10096.666_1.00] read 2876324180 hits; stored 4112957381 hits and 3335749 sequences (31132741397 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::11325.809_1.00] 3223555 query sequences remain after sub
[M::ma_hit_cut::11845.499_1.00] 3940583267 hits remain after cut
[M::ma_hit_flt::12142.079_1.00] 2353809445 hits remain after filtering; crude coverage after filtering: 465.37
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::12272.969_1.00] 3203496 query sequences remain after sub
[M::ma_hit_cut::12411.309_1.00] 2326599127 hits remain after cut
Segmentation fault

lh3 commented

Could you check if you have enough RAM? Thanks.

We have 512Gb RAM, but I submit my jobs only requiring 48Gb using bsub. And here are some log information
Exited with exit code 139.
Resource usage summary:
CPU time : 725882.19 sec.
Max Memory : 126026 MB
Max Swap : 131639 MB
Max Processes : 5
Max Threads : 48

I will rerun it with more RAM.

Thanks.

The Segmentation fault happened again even though I allocated 256Gb memory. Maybe another cause?

lh3 commented

I don't know. I have to reproduce the segfault to fix it. Is this a public data set or can I debug on it?

Hi Heng,

thanks. Unfortunately, the data is not public now. I tested the miniasm using the public pacbio reads from Arabidopsis thaliana(Ler). It showed good performance and was greatly fast. Actually, I have already run the assembly successfully by using PBcR pipeline. So I am very curious whether miniasm can get better assembly results.

I have encountered a similar problem. My genome is 620Mb, I have 84x coverage. I have run
./minimap/minimap -Sw5 -L100 -m0 -t32 reads.fasta.gz reads.fasta.gz | gzip -1 > reads.paf.gz
and obtained a file reads.paf.gz which is 148,091,787,711 bytes. Then I have run
./miniasm/miniasm -f reads.fasta.gz reads.paf.gz > reads.gfa and it fails with the following message

[M::main] ===> Step 1: reading read mappings <===
[M::ma_hit_read::13678.392*0.99] read 4118197117 hits; stored 6246339858 hits and 4359024 sequences (49751590514 bp)
[M::main] ===> Step 2: 1-pass (crude) read selection <===
[M::ma_hit_sub::20164.461*0.99] 3981562 query sequences remain after sub
[M::ma_hit_cut::21492.442*0.99] 5457262256 hits remain after cut
[M::ma_hit_flt::23199.391*0.99] 2589838094 hits remain after filtering; crude coverage after filtering: 410.91
[M::main] ===> Step 3: 2-pass (fine) read selection <===
[M::ma_hit_sub::23309.968*0.99] 3910410 query sequences remain after sub
[M::ma_hit_cut::23765.825*0.99] 2285338707 hits remain after cut
[M::ma_hit_chimeric::24052.557*0.99] identified 390841 chimeric reads
Segmentation fault (core dumped)

My server has 512Gb of RAM, of which I can use it entirely. I also checked for duplicate reads with the command zcat reads.fasta.gz | perl -ne 'print "$1\n" if />(\S+)/' | sort | uniq -d, but there is none. Any suggestion?

Hi Heng

Hope you are doing well.

We are trying to assemble our latest nanopore human genome set with miniasm, but encountering a coredump at the point of chimeric read detection:

It's about 30X coverage (~100Gb FASTA) but some of the reads are very long. The server has 1TB of RAM.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/mnt/human/bin/miniasm/miniasm -f rel4a.fastq rel4a.paf.gz'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 ma_hit_mark_unused (d=d@entry=0x1a43010, n=n@entry=-1524641225, a=a@entry=0x7f5b928b5010) at hit.c:30
30 d->seq[a[i].qns>>32].aux = d->seq[a[i].tn].aux = 1;
(gdb) bt
#0 ma_hit_mark_unused (d=d@entry=0x1a43010, n=n@entry=-1524641225, a=a@entry=0x7f5b928b5010) at hit.c:30
#1 0x00000000004081ac in ma_hit_contained (opt=opt@entry=0x7fff0d1567d0, d=d@entry=0x1a43010, sub=sub@entry=0x7fdb8da20010, n=, a=a@entry=0x7f5b928b5010) at hit.c:292
#2 0x00000000004018bc in main (argc=4, argv=0x7fff0d156928) at main.c:138

Let me know if you would like the specific input files uploaded?

Best
Nick

Can confirm that proposed patch voutcn@b39d757 solves this issue. Thanks!

lh3 commented

Thanks a million, @nickloman and @voutcn. Fix committed.