spades assembly failed
Opened this issue · 4 comments
Hello,
I am trying to run the AAFTF pipeline to assemble several Pleurotus genomes (testing it on 1 genome only) and I thought to run it as a pipeline at first. I am getting this error below. Spades seems to fail, but I cannot find any spades .log file anywhere. What do you think?
I am running it in HPC that uses SLURM, please see the slurm output and the submitted sbatch file attached.
Additionally,
- I am not totally sure the difference between these options (see
AAFTF piepieline -h
option below) :
--tmpdir TMPDIR Assembler temporary dir
and -w WORKDIR, --workdir WORKDIR temp directory` - and how to pass parameters to spades using the
--assembler_args ASSEMBLER_ARGS Additional SPAdes/Megahit arguments
if it is possible, for example, different kmer sizes etc.
Please let me know if you me to put these in a different issue ticket.
Thanks much!
Gian
benucci@dev-amd20 code]$ conda activate aaftf
(aaftf) [benucci@dev-amd20 code]$ AAFTF pipeline -h
usage: AAFTF pipeline [-h] [-q] [--tmpdir TMPDIR] [--assembler_args ASSEMBLER_ARGS] [--method METHOD] -l LEFT [-r RIGHT] -o BASENAME [-c cpus]
[-m MEMORY] [-ml MINLEN] [-a [SCREEN_ACCESSIONS ...]] [-u [SCREEN_URLS ...]] [-it ITERATIONS] [-mc MINCONTIGLEN]
[--AAFTF_DB AAFTF_DB] [-w WORKDIR] [-v] -p PHYLUM [PHYLUM ...] [--sourdb SOURDB] [--mincovpct MINCOVPCT]
Run entire AAFTF pipeline automagically
options:
-h, --help show this help message and exit
-q, --quiet Do not output warnings to stderr
--tmpdir TMPDIR Assembler temporary dir
--assembler_args ASSEMBLER_ARGS
Additional SPAdes/Megahit arguments
--method METHOD Assembly method: spades, dipspades, megahit
-l LEFT, --left LEFT left/forward reads of paired-end FASTQ or single-end FASTQ.
-r RIGHT, --right RIGHT
right/reverse reads of paired-end FASTQ.
-o BASENAME, --out BASENAME
Output basename, default to base name of --left reads
-c cpus, --cpus cpus Number of CPUs/threads to use.
-m MEMORY, --memory MEMORY
Memory (in GB) setting for SPAdes. Default is Auto
-ml MINLEN, --minlen MINLEN
Minimum read length after trimming, default: 75
-a [SCREEN_ACCESSIONS ...], --screen_accessions [SCREEN_ACCESSIONS ...]
Genbank accession number(s) to screen out from initial reads.
-u [SCREEN_URLS ...], --screen_urls [SCREEN_URLS ...]
URLs to download and screen out initial reads.
-it ITERATIONS, --iterations ITERATIONS
Number of Pilon Polishing iterations to run
-mc MINCONTIGLEN, --mincontiglen MINCONTIGLEN
Minimum length of contigs to keep
--AAFTF_DB AAFTF_DB Path to AAFTF resources, defaults to $AAFTF_DB
-w WORKDIR, --workdir WORKDIR
temp directory
-v, --debug Provide debugging messages
-p PHYLUM [PHYLUM ...], --phylum PHYLUM [PHYLUM ...]
Phylum or Phyla to keep matches, i.e. Ascomycota
--sourdb SOURDB SourMash LCA k-31 taxonomy database
--mincovpct MINCOVPCT
Minimum percent of N50 coverage to remove
workdir should be where the trimmed read files go while tempdir is where the spades temporary files are written during assembly
the error message from spades is:
"== Warning == output dir is not empty! Please, clean output directory before run."
so maybe you need to make sure the output directory is not still there? check on?
$project_dir/outputs/test_genome
Hello Jason,
Thank you for the email. I still get the same error after following your suggestions.
This is the error
== Warning == output dir is not empty! Please, clean output directory before run.
SPAdes genome assembler v3.15.5
Usage: spades.py [options] -o <output_dir>
spades.py: error: Please specify option (e.g. -1, -2, -s, etc)) for the following paths: --restart-from last
This is how I included the output directories
AAFTF pipeline \
...
--tmpdir /mnt/scratch/benucci/aaftf_temporary/ \
--workdir $project_dir/filtered/ \
--out $project_dir/outputs/test_genome
and this is what I have int he directories
[benucci@dev-amd20 project_PleurotusMartina24]$ ll outputs/
total 2.1G
-rw-r----- 1 benucci ShadeLab 184 Apr 19 17:01 spades.list
-rw-r----- 1 benucci ShadeLab 544M Apr 19 17:01 test_genome_1P.fastq.gz
-rw-r----- 1 benucci ShadeLab 568M Apr 19 17:01 test_genome_2P.fastq.gz
-rw-r----- 1 benucci ShadeLab 497M Apr 19 17:12 test_genome_filtered_1.fastq.gz
-rw-r----- 1 benucci ShadeLab 525M Apr 19 17:12 test_genome_filtered_2.fastq.gz
-rw-r----- 1 benucci ShadeLab 74K Apr 19 17:11 test_genome.mito.fasta
[benucci@dev-amd20 project_PleurotusMartina24]$ ll filtered/
total 2.0M
-rw-r----- 1 benucci ShadeLab 1.8M Apr 19 17:11 contamdb.fa
-rw-r----- 1 benucci ShadeLab 1.9K Apr 19 17:11 GCF_000819615.1_ViralProj14015_genomic.fna.gz
-rw-r----- 1 benucci ShadeLab 1.7M Apr 19 17:11 UniVec
[benucci@dev-amd20 benucci]$ ll /mnt/scratch/benucci/aaftf_temporary/
total 0
It seems like is writing the filtered reads in the --out
instead in the --workdir
.
Thank you,
Gian
Hello @hyphaltip
it seems now it is working just using these two parameters below:
--tmpdir /mnt/scratch/benucci/aaftf_temporary \
--out test_genome
Is running since 2 days, we'll see what I get...
Thanks,
Gian