Duplex from pre-basecalled BAM
Closed this issue · 4 comments
How do I call "dorado duplex" with BAM as the reads input?
I ran "dorado basecaller" on my pod5 files to generate a BAM file and now want to run duplex on that BAM file. The "dorado duplex" help menu says that reads can be in BAM/SAM format. However, when I run the command, I get the error [error] No POD5 or FAST5 reads found in path: basecalling-sup.bam
. "Basecalling-sup.bam" was the output of ```dorado basecaller sup ./pod5/". Unfortunately, I cannot run duplex basecalling on pod5 files because of time limits on our HPC.
Run environment:
- Dorado version: 0.8.2
- Dorado command:
dorado duplex sup basecalling-sup.bam > duplex.bam
Logs
[2024-11-02 12:54:16.845] [info] Running: "duplex" "-v" "sup" "basecalling-sup.bam"
[2024-11-02 12:54:18.934] [info] > No duplex pairs file provided, pairing will be performed automatically
[2024-11-02 12:54:18.935] [error] No POD5 or FAST5 reads found in path: basecalling-sup.bam
Hi @jmpolinski,
BAM / SAM input is only available for "basespace" duplex.
This can be set by using basespace
as the model argument and setting the pairs file argument.
Kind regards,
Rich
@HalfPhoton thanks for the info. Follow up question - What is the pairs file?
I'm having a very difficult time finding any information on post-basecalling duplex on the Github and the Community documentation. If you can direct me to where this information is, I'd be very appreciative
The pairs file is a list of all of the duplex read id pairs which are then basespace duplex basecalled to each other.
I believe the duplex-tools has means of generating the pairs file.
Kind regards,
Rich
@HalfPhoton thanks for the help. Duplex-tools does have a command for generating the pairs file, and I was able to use that with dorado duplex.