nanoporetech/rerio

bonito models vs. guppy 4.5.2+

Sumsarium opened this issue · 8 comments

I get this error when I run the (latest) bonito model (res_dna_r941_min_crf_v032) in guppy 4.5.2 and 4.5.3:

terminate called after throwing an instance of 'std::runtime_error'
what(): Could not allocate shared buffer for CUDA basecaller.
Aborted (core dumped)

It works fine in e.g. guppy 4.4.2 with RTX3070. Just wanted to inform.

Thanks for the report, I'll pass it on to the Guppy developers.

Hi,
We got the same issue when running guppy 4.5.3 with bonito models, also run with RTX 3070.

Thanks for the confirmation, the Guppy team are investigating.

Just for your info. I also encountered this issue.
Changing the "--gpu_runners_per_device" from 8 to 4 solved my issue.

Thanks, that's quite an illuminating clue. The configuration of the models is set for GPUs in our platforms, 16GB whereas the RTX 3070 only has 8GB.

Hi there, having the same issue, having ran guppy_basecaller (4.5.4+66c1a7753) previously with --chunk_size 2000 --chunks_per_runner 768 --gpu_runners_per_device 8 on guppy model dna_r9.4.1_450bps_hac.cfg. Can't run bonito on res_dna_r941_min_crf_v031.cfg or res_dna_r941_min_crf_v032.cfg with --gpu_runners_per_device set to 8, 4 or 2. Running on a GeForce RTX 2080. Any ideas? Thanks in advance!

@GeoMicroSoares I have a similar GPU like yours, and I solved a similar issue by also lowering chunk_size and chunks per runner. I suggest starting with some low settings, and then start optimising on a small sample set you may have lying around.

~/bin/ont-guppy/bin/guppy_basecaller -i ~/foo/bar.fast5 -s ./test_sup1 --compress_fastq -x "cuda:0" -c ~/bin/ont-guppy/data/dna_r9.4.1_450bps_sup_prom.cfg --gpu_runners_per_device 2 --chunk_size 1000
ONT Guppy basecalling software version 5.0.11+2b6dbff
config file:        /home/laura/bin/ont-guppy/data/dna_r9.4.1_450bps_sup_prom.cfg
model file:         /home/laura/bin/ont-guppy/data/template_r9.4.1_450bps_sup_prom.jsn
input path:         /home/laura/lambda2_fast5
save path:          ./test_sup1
chunk size:         1000
chunks per runner:  256
minimum qscore:     10
records per file:   4000
fastq compression:  ON
num basecallers:    4
gpu device:         cuda:0
kernel path:        
runners per device: 2

Hope this helps.

I have the same issue with a Quadro RTX 4000, the HAC and Fast models work but not the SUP. Reducing the chunk size to 500 allows it to run however