spades assembly failed

Question

spades assembly failed

Opened this issue 5 months ago · 4 comments

Hello,

I am trying to run the AAFTF pipeline to assemble several Pleurotus genomes (testing it on 1 genome only) and I thought to run it as a pipeline at first. I am getting this error below. Spades seems to fail, but I cannot find any spades .log file anywhere. What do you think?
I am running it in HPC that uses SLURM, please see the slurm output and the submitted sbatch file attached.

Additionally,

I am not totally sure the difference between these options (see AAFTF piepieline -h option below) :
--tmpdir TMPDIR Assembler temporary dir and -w WORKDIR, --workdir WORKDIR temp directory`
and how to pass parameters to spades using the --assembler_args ASSEMBLER_ARGS Additional SPAdes/Megahit arguments if it is possible, for example, different kmer sizes etc.
Please let me know if you me to put these in a different issue ticket.
Thanks much!
Gian

benucci@dev-amd20 code]$ conda activate aaftf
(aaftf) [benucci@dev-amd20 code]$ AAFTF pipeline -h
usage: AAFTF pipeline [-h] [-q] [--tmpdir TMPDIR] [--assembler_args ASSEMBLER_ARGS] [--method METHOD] -l LEFT [-r RIGHT] -o BASENAME [-c cpus]
                      [-m MEMORY] [-ml MINLEN] [-a [SCREEN_ACCESSIONS ...]] [-u [SCREEN_URLS ...]] [-it ITERATIONS] [-mc MINCONTIGLEN]
                      [--AAFTF_DB AAFTF_DB] [-w WORKDIR] [-v] -p PHYLUM [PHYLUM ...] [--sourdb SOURDB] [--mincovpct MINCOVPCT]

Run entire AAFTF pipeline automagically

options:
  -h, --help            show this help message and exit
  -q, --quiet           Do not output warnings to stderr
  --tmpdir TMPDIR       Assembler temporary dir
  --assembler_args ASSEMBLER_ARGS
                        Additional SPAdes/Megahit arguments
  --method METHOD       Assembly method: spades, dipspades, megahit
  -l LEFT, --left LEFT  left/forward reads of paired-end FASTQ or single-end FASTQ.
  -r RIGHT, --right RIGHT
                        right/reverse reads of paired-end FASTQ.
  -o BASENAME, --out BASENAME
                        Output basename, default to base name of --left reads
  -c cpus, --cpus cpus  Number of CPUs/threads to use.
  -m MEMORY, --memory MEMORY
                        Memory (in GB) setting for SPAdes. Default is Auto
  -ml MINLEN, --minlen MINLEN
                        Minimum read length after trimming, default: 75
  -a [SCREEN_ACCESSIONS ...], --screen_accessions [SCREEN_ACCESSIONS ...]
                        Genbank accession number(s) to screen out from initial reads.
  -u [SCREEN_URLS ...], --screen_urls [SCREEN_URLS ...]
                        URLs to download and screen out initial reads.
  -it ITERATIONS, --iterations ITERATIONS
                        Number of Pilon Polishing iterations to run
  -mc MINCONTIGLEN, --mincontiglen MINCONTIGLEN
                        Minimum length of contigs to keep
  --AAFTF_DB AAFTF_DB   Path to AAFTF resources, defaults to $AAFTF_DB
  -w WORKDIR, --workdir WORKDIR
                        temp directory
  -v, --debug           Provide debugging messages
  -p PHYLUM [PHYLUM ...], --phylum PHYLUM [PHYLUM ...]
                        Phylum or Phyla to keep matches, i.e. Ascomycota
  --sourdb SOURDB       SourMash LCA k-31 taxonomy database
  --mincovpct MINCOVPCT
                        Minimum percent of N50 coverage to remove

aaftf_piperun.zip

Answer 1 · 2024-04-18T20:17:21.000Z

workdir should be where the trimmed read files go while tempdir is where the spades temporary files are written during assembly

the error message from spades is:
"== Warning == output dir is not empty! Please, clean output directory before run."

so maybe you need to make sure the output directory is not still there? check on?
$project_dir/outputs/test_genome

Answer 2 · 2024-04-22T15:57:01.000Z

Hello Jason,

Thank you for the email. I still get the same error after following your suggestions.

This is the error

== Warning ==  output dir is not empty! Please, clean output directory before run.


SPAdes genome assembler v3.15.5

Usage: spades.py [options] -o <output_dir>
spades.py: error: Please specify option (e.g. -1, -2, -s, etc)) for the following paths: --restart-from last

This is how I included the output directories

	AAFTF pipeline \
        ... 
	--tmpdir /mnt/scratch/benucci/aaftf_temporary/ \
	--workdir $project_dir/filtered/ \
	--out $project_dir/outputs/test_genome

and this is what I have int he directories

[benucci@dev-amd20 project_PleurotusMartina24]$ ll outputs/
total 2.1G
-rw-r----- 1 benucci ShadeLab  184 Apr 19 17:01 spades.list
-rw-r----- 1 benucci ShadeLab 544M Apr 19 17:01 test_genome_1P.fastq.gz
-rw-r----- 1 benucci ShadeLab 568M Apr 19 17:01 test_genome_2P.fastq.gz
-rw-r----- 1 benucci ShadeLab 497M Apr 19 17:12 test_genome_filtered_1.fastq.gz
-rw-r----- 1 benucci ShadeLab 525M Apr 19 17:12 test_genome_filtered_2.fastq.gz
-rw-r----- 1 benucci ShadeLab  74K Apr 19 17:11 test_genome.mito.fasta

[benucci@dev-amd20 project_PleurotusMartina24]$ ll filtered/
total 2.0M
-rw-r----- 1 benucci ShadeLab 1.8M Apr 19 17:11 contamdb.fa
-rw-r----- 1 benucci ShadeLab 1.9K Apr 19 17:11 GCF_000819615.1_ViralProj14015_genomic.fna.gz
-rw-r----- 1 benucci ShadeLab 1.7M Apr 19 17:11 UniVec

[benucci@dev-amd20 benucci]$ ll /mnt/scratch/benucci/aaftf_temporary/
total 0

It seems like is writing the filtered reads in the --out instead in the --workdir.
Thank you,

Gian

Answer 3 · 2024-04-22T16:00:05.000Z

Is the outputs/test_dir there already. Is outputs already made These are spades errors because a folder exists or possibly Just leave workdir off k guess I don’t use the pipeline function. I run steps individually so maybe you hit an untested parameter option? Sent from Gmail Mobile ***@***.*** Jason Stajich - UC Riverside http://lab.stajich.org

…

On Mon, Apr 22, 2024 at 8:57 AM Gian Nico ***@***.***> wrote: Hello Jason, Thank you for the email. I still get the same error after following your suggestions. This is the error == Warning == output dir is not empty! Please, clean output directory before run. SPAdes genome assembler v3.15.5 Usage: spades.py [options] -o <output_dir> spades.py: error: Please specify option (e.g. -1, -2, -s, etc)) for the following paths: --restart-from last This is how I included the output directories AAFTF pipeline \ ... --tmpdir /mnt/scratch/benucci/aaftf_temporary/ \ --workdir $project_dir/filtered/ \ --out $project_dir/outputs/test_genome and this is what I have int he directories ***@***.*** project_PleurotusMartina24]$ ll outputs/ total 2.1G -rw-r----- 1 benucci ShadeLab 184 Apr 19 17:01 spades.list -rw-r----- 1 benucci ShadeLab 544M Apr 19 17:01 test_genome_1P.fastq.gz -rw-r----- 1 benucci ShadeLab 568M Apr 19 17:01 test_genome_2P.fastq.gz -rw-r----- 1 benucci ShadeLab 497M Apr 19 17:12 test_genome_filtered_1.fastq.gz -rw-r----- 1 benucci ShadeLab 525M Apr 19 17:12 test_genome_filtered_2.fastq.gz -rw-r----- 1 benucci ShadeLab 74K Apr 19 17:11 test_genome.mito.fasta ***@***.*** project_PleurotusMartina24]$ ll filtered/ total 2.0M -rw-r----- 1 benucci ShadeLab 1.8M Apr 19 17:11 contamdb.fa -rw-r----- 1 benucci ShadeLab 1.9K Apr 19 17:11 GCF_000819615.1_ViralProj14015_genomic.fna.gz -rw-r----- 1 benucci ShadeLab 1.7M Apr 19 17:11 UniVec ***@***.*** benucci]$ ll /mnt/scratch/benucci/aaftf_temporary/ total 0 It seems like is writing the filtered reads in the --out instead in the --workdir. Thank you, Gian — Reply to this email directly, view it on GitHub <#27 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAL5O2MV3LYGO2JP22ZBH3Y6UXOFAVCNFSM6AAAAABGN5OKGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZQGAYDKNZVGU> . You are receiving this because you commented.Message ID: ***@***.***>

Answer 4 · 2024-05-01T15:25:47.000Z

Hello @hyphaltip

it seems now it is working just using these two parameters below:

--tmpdir /mnt/scratch/benucci/aaftf_temporary \
--out test_genome

Is running since 2 days, we'll see what I get...
Thanks,
Gian