SPAdes not perform correctly issue with spades-hammer

Question

SPAdes not perform correctly issue with spades-hammer

manotush opened this issue 7 months ago · 1 comments

Description of bug

When I run the spades program to conduct genome assembly , every time I face one issue is spades-hammer. Please help me to troubleshoot that issue

spades.log

Command line: /home/manotush/anaconda3/bin/spades.py -o /home/manotush/raw_sequence/spades_assembly -1 /home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz -2 /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz

System information:
SPAdes version: 3.15.5
Python version: 3.11.5
OS: Linux-6.5.0-25-generic-x86_64-with-glibc2.35

Output dir: /home/manotush/raw_sequence/spades_assembly
Mode: read error correction and assembling
Debug mode is turned OFF

Dataset parameters:
Standard mode
For multi-cell/isolate data we recommend to use '--isolate' option; for single-cell MDA data use '--sc'; for metagenomic data use '--meta'; for RNA-Seq use '--rna'.
Reads:
Library number: 1, library type: paired-end
orientation: fr
left reads: ['/home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz']
right reads: ['/home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz']
interlaced reads: not specified
single reads: not specified
merged reads: not specified
Read error correction parameters:
Iterations: 1
PHRED offset will be auto-detected
Corrected reads will be compressed
Assembly parameters:
k: automatic selection based on read length
Repeat resolution is enabled
Mismatch careful mode is turned OFF
MismatchCorrector will be SKIPPED
Coverage cutoff is turned OFF
Other parameters:
Dir for temp files: /home/manotush/raw_sequence/spades_assembly/tmp
Threads: 16
Memory limit (in Gb): 7

======= SPAdes pipeline started. Log can be found here: /home/manotush/raw_sequence/spades_assembly/spades.log

/home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz: max reads length: 231
/home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz: max reads length: 231

Reads length: 231

Default k-mer sizes were set to [21, 33, 55, 77] because estimated read length (231) is equal to or greater than 150

===== Before start started.

===== Read error correction started.

== Running: /home/manotush/anaconda3/bin/spades-hammer /home/manotush/raw_sequence/spades_assembly/corrected/configs/config.info

0:00:00.000 1M / 26M INFO General (main.cpp : 75) Starting BayesHammer, built from N/A, git revision N/A
0:00:00.003 1M / 26M INFO General (main.cpp : 76) Loading config from /home/manotush/raw_sequence/spades_assembly/corrected/configs/config.info
0:00:00.009 1M / 26M INFO General (main.cpp : 78) Maximum # of threads to use (adjusted due to OMP capabilities): 8
0:00:00.010 1M / 26M INFO General (memory_limit.cpp : 54) Memory limit set to 7 Gb
0:00:00.010 1M / 26M INFO General (main.cpp : 86) Trying to determine PHRED offset
0:00:00.010 1M / 26M INFO General (main.cpp : 92) Determined value is 33
0:00:00.011 1M / 26M INFO General (hammer_tools.cpp : 38) Hamming graph threshold tau=1, k=21, subkmer positions = [ 0 10 ]
0:00:00.011 1M / 26M INFO General (main.cpp : 113) Size of aux. kmer data 24 bytes
=== ITERATION 0 begins ===
0:00:00.011 1M / 26M INFO General (kmer_index_builder.hpp : 243) Splitting kmer instances into 16 files using 8 threads. This might take a while.
0:00:00.012 1M / 26M INFO General (file_limit.hpp : 42) Open file limit set to 1024
0:00:00.012 1M / 26M INFO General (kmer_splitter.hpp : 93) Memory available for splitting buffers: 0.291663 Gb
0:00:00.012 1M / 26M INFO General (kmer_splitter.hpp : 101) Using cell size of 2446644
0:00:00.013 2689M / 2689M INFO K-mer Splitting (kmer_data.cpp : 97) Processing /home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz
0:00:09.921 2689M / 2689M INFO K-mer Splitting (kmer_data.cpp : 107) Processed 526559 reads
0:00:09.922 2689M / 2689M INFO K-mer Splitting (kmer_data.cpp : 97) Processing /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz
0:00:19.722 2689M / 2689M INFO K-mer Splitting (kmer_data.cpp : 107) Processed 1050593 reads
0:00:19.722 2689M / 2689M INFO K-mer Splitting (kmer_data.cpp : 112) Total 1050593 reads processed
0:00:19.722 1M / 2380M INFO General (kmer_index_builder.hpp : 249) Starting k-mer counting.
0:00:19.952 1M / 2380M INFO General (kmer_index_builder.hpp : 260) K-mer counting done. There are 16965854 kmers in total.
0:00:19.952 1M / 2380M INFO K-mer Index Building (kmer_index_builder.hpp : 395) Building perfect hash indices
0:00:20.538 15M / 2380M INFO K-mer Index Building (kmer_index_builder.hpp : 431) Index built. Total 16965854 kmers, 12265608 bytes occupied (5.78367 bits per kmer).
0:00:20.539 15M / 2380M INFO K-mer Counting (kmer_data.cpp : 354) Arranging kmers in hash map order
0:00:21.235 279M / 2380M INFO General (main.cpp : 148) Clustering Hamming graph.
0:00:42.646 279M / 2380M INFO General (main.cpp : 155) Extracting clusters:
0:00:42.647 279M / 2380M INFO General (concurrent_dsu.cpp : 18) Connecting to root
0:00:42.759 279M / 2380M INFO General (concurrent_dsu.cpp : 34) Calculating counts
0:00:46.780 552M / 2380M INFO General (concurrent_dsu.cpp : 63) Writing down entries
0:00:52.199 279M / 2380M INFO General (main.cpp : 167) Clustering done. Total clusters: 9758206
0:00:52.216 147M / 2380M INFO K-mer Counting (kmer_data.cpp : 371) Collecting K-mer information, this takes a while.
0:00:52.480 539M / 2380M INFO K-mer Counting (kmer_data.cpp : 377) Processing /home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz
0:01:13.399 539M / 2380M INFO K-mer Counting (kmer_data.cpp : 377) Processing /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz
0:01:34.300 539M / 2380M INFO K-mer Counting (kmer_data.cpp : 384) Collection done, postprocessing.
0:01:34.360 539M / 2380M INFO K-mer Counting (kmer_data.cpp : 397) There are 16965854 kmers in total. Among them 8733374 (51.4762%) are singletons.
0:01:34.360 539M / 2380M INFO General (main.cpp : 173) Subclustering Hamming graph
0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 650) Subclustering done. Total 9 non-read kmers were generated.
0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 651) Subclustering statistics:
0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 652) Total singleton hamming clusters: 5343380. Among them 3321826 (62.1671%) are good
0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 653) Total singleton subclusters: 33485. Among them 33312 (99.4834%) are good
0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 654) Total non-singleton subcluster centers: 4437459. Among them 2727498 (61.4653%) are good
0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 655) Average size of non-trivial subcluster: 2.61918 kmers
0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 656) Average number of sub-clusters per non-singleton cluster: 1.01271
0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 657) Total solid k-mers: 6082636
0:01:40.121 539M / 2380M INFO Hamming Subclustering (kmer_cluster.cpp : 658) Substitution probabilities: 4,4
0:01:40.134 539M / 2380M INFO General (main.cpp : 178) Finished clustering.
0:01:40.135 539M / 2380M INFO General (main.cpp : 197) Starting solid k-mers expansion in 8 threads.
0:01:52.215 539M / 2380M INFO General (main.cpp : 218) Solid k-mers iteration 0 produced 17700 new k-mers.
0:02:04.309 539M / 2380M INFO General (main.cpp : 218) Solid k-mers iteration 1 produced 449 new k-mers.
0:02:16.488 539M / 2380M INFO General (main.cpp : 218) Solid k-mers iteration 2 produced 0 new k-mers.
0:02:16.488 539M / 2380M INFO General (main.cpp : 222) Solid k-mers finalized
0:02:16.488 539M / 2380M INFO General (hammer_tools.cpp : 222) Starting read correction in 8 threads.
0:02:16.488 539M / 2380M INFO General (hammer_tools.cpp : 235) Correcting pair of reads: /home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz and /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz
0:02:20.597 1260M / 2380M INFO General (hammer_tools.cpp : 170) Prepared batch 0 of 524034 reads.
0:02:33.719 1308M / 2380M INFO General (hammer_tools.cpp : 177) Processed batch 0
0:02:34.871 1308M / 2380M INFO General (hammer_tools.cpp : 187) Written batch 0
0:02:34.871 1308M / 2380M ERROR General (hammer_tools.cpp : 191) Pair of read files /home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz and /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz contain unequal amount of reads

== Error == system call for: "['/home/manotush/anaconda3/bin/spades-hammer', '/home/manotush/raw_sequence/spades_assembly/corrected/configs/config.info']" finished abnormally, OS return value: 21
None

In case you have troubles running SPAdes, you can write to spades.support@cab.spbu.ru
or report an issue on our GitHub repository github.com/ablab/spades
Please provide us with params.txt and spades.log files from the output directory.

SPAdes log can be found here: /home/manotush/raw_sequence/spades_assembly/spades.log

Thank you for using SPAdes!

params.txt

(base) manotush@manotush-pc:~/raw_sequence$ spades.py -o spades_assembly -1 74A_S28_L001_R1_001_trim_final.fastq.gz -2 74A_S28_L001_R2_001_trim_final.fastq.gz

== Warning == No assembly mode was specified! If you intend to assemble high-coverage multi-cell/isolate data, use '--isolate' option.

Command line: /home/manotush/anaconda3/bin/spades.py -o /home/manotush/raw_sequence/spades_assembly -1 /home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz -2 /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz

System information:
SPAdes version: 3.15.5
Python version: 3.11.5
OS: Linux-6.5.0-25-generic-x86_64-with-glibc2.35

Output dir: /home/manotush/raw_sequence/spades_assembly
Mode: read error correction and assembling
Debug mode is turned OFF

Dataset parameters:
Standard mode
For multi-cell/isolate data we recommend to use '--isolate' option; for single-cell MDA data use '--sc'; for metagenomic data use '--meta'; for RNA-Seq use '--rna'.
Reads:
Library number: 1, library type: paired-end
orientation: fr
left reads: ['/home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz']
right reads: ['/home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz']
interlaced reads: not specified
single reads: not specified
merged reads: not specified
Read error correction parameters:
Iterations: 1
PHRED offset will be auto-detected
Corrected reads will be compressed
Assembly parameters:
k: automatic selection based on read length
Repeat resolution is enabled
Mismatch careful mode is turned OFF
MismatchCorrector will be SKIPPED
Coverage cutoff is turned OFF
Other parameters:
Dir for temp files: /home/manotush/raw_sequence/spades_assembly/tmp
Threads: 16
Memory limit (in Gb): 7

SPAdes version

SPAdes v3.15.5

Operating System

Linux-6.5.0-25-generic-x86_64-with-glibc2.35

Python Version

Python 3.11.5

Method of SPAdes installation

conda

No errors reported in spades.log

Yes

Answer 1 · 2024-03-29T04:56:58.000Z

The log clearly reads:

0:02:34.871 1308M / 2380M ERROR General (hammer_tools.cpp : 191) Pair of read files /home/manotush/raw_sequence/74A_S28_L001_R1_001_trim_final.fastq.gz and /home/manotush/raw_sequence/74A_S28_L001_R2_001_trim_final.fastq.gz contain unequal amount of reads

So your input files are corrupted, they do not contain proper set of paired-end reads.