5mC_5hmC and 5mCG_5hmCG basecallers swapped
samim21 opened this issue · 4 comments
Issue Report
Please describe the issue:
It appears like the 5mC_5hmC and 5mCG_5hmCG sup basecallers may be swapped in some way in the newest release. I ran dorado 0.8.0 with the 5mC_5hmC sup basecaller and it only basecalled CG sites. I tried running the 5mCG_5hmCG basecaller and it output the calls for all Cs.
Steps to reproduce the issue:
Here is the command I ran for the 5mC_5hmC basecalling:
dorado basecaller sup,5mC_5hmC passed_reads.pod5 > 5mC_5hmC_unmapped.bam
Here is the command I ran for the 5mCG_5hmCG basecalling:
dorado basecaller sup,5mCG_5hmCG passed_reads.pod5 > 5mCG_5hmCG_unmapped.bam
Run environment:
- Dorado version: 0.8.0
- Dorado command:
- dorado basecaller sup,5mC_5hmC passed_reads.pod5 > 5mC_5hmC_unmapped.bam
- dorado basecaller sup,5mCG_5hmCG passed_reads.pod5 > 5mCG_5hmCG_unmapped.bam
- Operating system:
- Hardware (CPUs, Memory, GPUs): Nvidia GPU
- Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance):
- Source data location (on device or networked drive - NFS, etc.): pod5
- Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB):
- Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):
Logs
- Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)
Hi @samim21,
Thanks for reporting this. It looks like there is indeed a mistake here with the SUP (not HAC) models for C and CG contexts being swapped. We are going to fix this very soon. In the meantime, if you run with models swapped (CG for all-context and C for CG) this will work, as the models are identical.
We apologise for this bug and thank you again for reporting this.
Can you confirm if this issue impacts any version of MinKNOW currently shipping or is restricted to just the latest dorado release?
This only affects the v2 models in Dorado 0.8.0.