nanoporetech/dorado

The lowest GPU memory required for dorado correct?

Opened this issue · 3 comments

Issue Report

I tried to use the CPU model running Dorado correct from the prepared paf file, but the speed was extremely slow 1GB corrected reads in 4 hours.

I would like to know the lowest GPU memory required. Does 16/20Gb enough for correction? GPU with >20Gb memory is quite expensive in my country.

Best wishes!

Run environment:

  • Hardware (CPUs, Memory, GPUs): Intel Xeon 8336C, 512 Gb RAM, No GPU
  • Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): Raw reads mapped PAF file
  • Source data location (on device or networked drive - NFS, etc.): Local HDD
  • Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): ~40GB raw UL reads, ~120Gb PAF

Hi @hrluo93,

We don't have a better estimate for a minimum GPU requirement unfortunately.

As you've read, we recommend GPUs with high VRAM for use with Dorado Correct as it's a computationally intensive task but it is possible to reduce the VRAM usage If you're having trouble by setting --batchsize argument during inference, but that might only go so far.

dorado correct reads.fastq --batch-size <number> > corrected_reads.fasta

Hi @hrluo93,

We don't have a better estimate for a minimum GPU requirement unfortunately.

As you've read, we recommend GPUs with high VRAM for use with Dorado Correct as it's a computationally intensive task but it is possible to reduce the VRAM usage If you're having trouble by setting --batchsize argument during inference, but that might only go so far.

dorado correct reads.fastq --batch-size <number> > corrected_reads.fasta

Thanks for your reply, I am trying to test if 16Gb GPU is possible and giving feedback!

Thanks again!

I have tested on an RTX 3090 GPU. Dorado 0.8.2. One cell ~80Gb fastq (~40Gb raw Ul fasta)

  1. Running: correct" "-x" "cuda:0" "--infer-threads" "1" "-b" "32", ([info] Using batch size 32 on device in inference thread 0. )
  2. Running: "correct" "-x" "cuda:0", [2024-11-10 17:21:17.560] [info] Using batch size 8 on device in inference thread 0.
    [2024-11-10 17:21:17.560] [info] Using batch size 8 on device cuda:0 in inference thread 1.
    22Gb GPU memory used during correction Produced ~4Gb correction reads per hour.

I have tested on an RTX 4060ti GPU on the Linux platform with "--infer-threads" "1" "-b" "8", 12Gb GPU memory used during correction with Produced ~3Gb correction reads per hour.

Best wishes!