/Dorado_Basecalling

GNU General Public License v3.0GPL-3.0

How to re-basecall old Nanopore runs with Dorado?

AUTHOR: Dr Asad Prodhan https://asadprodhan.github.io/

License GPL 3.0



Re-basecalling old Nanopore runs with Dorado.



Install softwares

Create and activate a conda environment

conda create -n dorado
conda activate dorado

Install Nextflow and Singularity

  • Install Nextflow
conda install -c bioconda nextflow
  • Run the following command to make sure that Nextflow has been installed
nextflow -h

If you see the Nextflow options, then the Nextflow has been installed

  • Install Singularity
conda install -c conda-forge singularity
  • Run the following command to make sure that Singularity has been installed
singularity -h

Install pod5

pip install pod5

Add the pod5 executable path to PATH variable permanently

export PATH=$PATH:/path/to/the/pod5/executable

Install Dorado basecaller

wget https://cdn.oxfordnanoportal.com/software/analysis/dorado-0.6.2-linux-x64.tar.gz

More deatils: https://github.com/nanoporetech/dorado?tab=readme-ov-file

tar -zvxf dorado-0.6.2-linux-x64.tar.gz

Add the dorado executable path to PATH variable permanently

nano .bashrc
  • Add the following line to the .bashrc profile
export PATH=$PATH:/xx/xx/bin/dorado-0.6.2-linux-x64/bin


Analyse data

Convert the fast5 files into pod5

pod5 convert fast5 sample01.fast5 --output sample01.pod5

Loop function:

for FILE in ./fast5/*.fast5; do FILENAME=$(basename "$FILE" .fast5); pod5 convert fast5 "$FILE" --output ./pod5/${FILENAME}.pod5; done

** It can create the pod5 directory itself**


Basecall from the pod5 file

https://www.youtube.com/watch?v=wBDU0X1ZCco

dorado basecaller sup --emit-fastq barcode07.pod5 > barcode07.fastq

DNA model and its latest version auto selected when you select the mode-fast,hac,sup

Dorado generates unaligned bam file by default. Many useful information about each read are added in the bam tags, which can be extracted by the Dorado summary command.

If you want to generate fastq file instead of bam, the use the "--emit-fastq" flag

Loop function [DNA model and its latest version auto selected]:

for FILE in ./pod5/*.pod5; do FILENAME=$(basename "$FILE" .pod5); dorado basecaller sup --emit-fastq "$FILE" > ./fastq/${FILENAME}.fastq; done

If you want to hand-pick the DNA model for your old Nanopore runs, then follow the following steps

Loop function [DNA model hand-picked]:

First, download the model of your interest:

dorado download --list
dorado download --model dna_r9.4.1_e8_sup@v3.6

Then run the loop function:

for FILE in ./pod5/*.pod5; do FILENAME=$(basename "$FILE" .pod5); dorado basecaller --recursive dna_r9.4.1_e8_sup@v3.6 --emit-fastq "$FILE" > ./fastq/${FILENAME}.fastq; done

It can't create the fastq directory, need to create one beforehand


If you want to collect fastq read summary

dorado summary barcode07.bam > barcode07_seq_info.tsv

Alternatively, you can use the fastqc program


Dorado de-multiplexing

For de-multiplexing, all files need to be concatenated first. Otherwise, the loop function replaces the previous files

dorado demux --kit-name SQK-RBK004 --emit-fastq --output-dir ./barcodes beforeDemuxed.fastq

Loop function to check record numbers in each fastq file

for FILE in ./*.fastq; do grep -c "@" "$FILE"; done


Further reading

https://github.com/nanoporetech/dorado#duplex

https://nanoporetech.com/platform/accuracy/duplex