nanoporetech/dorado

How to disable split reads while using dorado or alternatively split reads within the pod5 files

davidss101 opened this issue · 2 comments

Issue Report

Please describe the issue:

I'm trying to basecall my pod5 files using dorado, but I need to disable the split reads option for downstream purposes. Alternatively, it might be better for me to somehow split the reads in the pod5 files. Does anyone know how to do either of these things? Thanks in advance.

Please provide a clear and concise description of the issue you are seeing and the result you expect.

Please see the description above.

Steps to reproduce the issue:

Please list any steps to reproduce the issue.

Run environment:

  • Dorado version: I haven't run Dorado yet for this analysis.
  • Dorado command:
  • Operating system:
  • Hardware (CPUs, Memory, GPUs):
  • Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance):
  • Source data location (on device or networked drive - NFS, etc.):
  • Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB):
  • Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):

Logs

  • Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)

Hi @davidss101,

Dorado does not provide an option to disable read-splitting. If you really need this, you could compile dorado from source yourself and disable it here.

Dear Malton-ont,

Thank you very much. I'm wondering if there might be an faster, easier way to deal with the reads in the original pod5 files although I don't think the pod5 python package has this feature.

Thanks.