This is a Nanopore assembly pipeline aimed at LSK109 chemistry and flowcell r9 series generated reads.
It then does some basic QC by removing control DNA which is sometimes used during a run to debug potential problems, but which should not end up in the final assembly.
Assembly happens using two different assemblers, Flye and nextDenovo. Both are very fast and have different strenghts. I have found that the nextDenovo assembly overall is better with fewer contigs, but tends to trim telomeres and sometimes loses the mitogenome. Flye however is great at maintaining telomeres and the mitogenome tends to fall out as a single, non-concatenated contig (other assemblers create tandem copies as they don't expect a circular sequence). Canu is used to generate corredted reads which I use to manually check and curate the assemblies in Geneious.
Currently this pipeline is optimised to run on a Nimbus instance with 16 cores and 64 GB of RAM.
The pipeline requires you to basecall your raw fast5 or pod5 files with dorado and then compress the reads wigh gzip. The files need to be names "SampleID.fastq.gz" where 'SampleID' is whatever you want to call your sample. This ID will be used throughout the pipeline to name files.
If you are recalling old fast5 or pod5 data generated with LSK109 chemistry on an r9 series flowcell:
- First download the latest model, this one is last available model for LSK-109 chemistry on R9 flowcells:
dorado download --model dna_r9.4.1_e8_sup@v3.6
- Then run basecalling:
dorado basecaller dna_r9.4.1_e8_sup@v3.6 pod5s/ --emit-fastq > sampleID.dorado.fastq && \
gzip -9 sampleID.dorado.fastq
If you have the compute resources available you can try to correct the reads with the new dorado correct module, but the reads cannot be gzipped for that:
dorado correct sampleID.dorado.fastq > sampleID.dorado.corrected.fasta
nextflow run jwdebler/nanopore_LSK109_assembly -resume -latest -profile docker,nimbus --reads "reads/"
We have a few profiles available to customise how the pipeline will run.
nimbus
sets the canu assembler to use 15 CPUs and 60GB RAM.zeus
sets the canu assembler to use 14 CPUs and 64GB RAM, and sets some cluster specific options to use the slurm based scheduler at Pawsey.docker
anddocker_sudo
sets it to use docker containers,docker_sudo
is identical except that docker is run as root (required for some installations of docker).
--reads <glob>
Required
A folder containing 1 files per sample.
The basename of the file is used as the sample ID.
Example of file names: `Sample1.fastq.gz`, `Sample2.fastq.gz`.
(Default: a folder called `reads/`)
--genomeSize <glob>
not required
Size of genome, for example "42m" (Default: 42m)
--medakaModel <glob>
not required
Which basecaller model was used?
r941_min_sup_g507 (kit109, sup)
(Default: r941_min_sup_g507)
--minlen
Min read length to keep for assembly
(Default: 1000)
--quality
Min read q-score to keep for read filtering
(Default: 10)
--outdir <path>
The directory to store the results in.
(Default: `assembly`)