nanoporetech/dorado

Duplex from pre-basecalled BAM

Closed this issue · 4 comments

How do I call "dorado duplex" with BAM as the reads input?

I ran "dorado basecaller" on my pod5 files to generate a BAM file and now want to run duplex on that BAM file. The "dorado duplex" help menu says that reads can be in BAM/SAM format. However, when I run the command, I get the error [error] No POD5 or FAST5 reads found in path: basecalling-sup.bam. "Basecalling-sup.bam" was the output of ```dorado basecaller sup ./pod5/". Unfortunately, I cannot run duplex basecalling on pod5 files because of time limits on our HPC.

Run environment:

  • Dorado version: 0.8.2
  • Dorado command: dorado duplex sup basecalling-sup.bam > duplex.bam

Logs

[2024-11-02 12:54:16.845] [info] Running: "duplex" "-v" "sup" "basecalling-sup.bam"
[2024-11-02 12:54:18.934] [info] > No duplex pairs file provided, pairing will be performed automatically
[2024-11-02 12:54:18.935] [error] No POD5 or FAST5 reads found in path: basecalling-sup.bam

Hi @jmpolinski,

BAM / SAM input is only available for "basespace" duplex.

This can be set by using basespace as the model argument and setting the pairs file argument.

Kind regards,
Rich

@HalfPhoton thanks for the info. Follow up question - What is the pairs file?

I'm having a very difficult time finding any information on post-basecalling duplex on the Github and the Community documentation. If you can direct me to where this information is, I'd be very appreciative

The pairs file is a list of all of the duplex read id pairs which are then basespace duplex basecalled to each other.

I believe the duplex-tools has means of generating the pairs file.

Kind regards,
Rich

@HalfPhoton thanks for the help. Duplex-tools does have a command for generating the pairs file, and I was able to use that with dorado duplex.