CUDA error in NanoPreprocess

Question

CUDA error in NanoPreprocess

albertoriva opened this issue 4 years ago · 6 comments

Hello,

I'm trying to run NanoPreprocess with guppy as the basecaller on our cluster system. I've installed the GPU guppy binaries in the correct place, but when I run the pipeline I get this error:

Command executed:

export LD_LIBRARY_PATH="/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs"
guppy_basecaller -x "cuda:0" --flowcell FLO-MIN106 --kit SQK-RNA002 --fast5_out -i ./ --save_path ./0_out --cpu_threads_per_caller 1 --gpu_runners_per_device 1 --num_callers 1
cat 0_out/.fastq | awk '{if (NR%4==2) gsub("U","T"); print}' >> 0.fastq
rm 0_out/.fastq
gzip 0.fastq

Command exit status:
139

Command output:
ONT Guppy basecalling software version 4.2.2+effbaf8
config file: /blue/icbrbi/apps/master_of_pores/NanoPreprocess/bin/ont-guppy/data/rna_r9.4.1_70bps_hac.cfg
model file: /blue/icbrbi/apps/master_of_pores/NanoPreprocess/bin/ont-guppy/data/template_rna_r9.4.1_70bps_hac.jsn
input path: ./
save path: ./0_out
chunk size: 2000
chunks per runner: 512
records per file: 4000
num basecallers: 1
gpu device: cuda:0
kernel path:
runners per device: 1

Command error:
WARNING: underlay of /etc/localtime required more than 50 (82) bind mounts
WARNING: underlay of /usr/bin/nvidia-smi required more than 50 (237) bind mounts
[guppy/error] *common::LoadModuleFromFatbin: Loading fatbin file shared.fatbin failed with: CUDA error at /builds/ofan/ont_core_cpp/ont_core/common/cuda_common.cpp:54: CUDA_ERROR_NO_BINARY_FOR_GPU

Followed by a segmentation fault. I suspect that the reason is that on our system the cuda libraries are loaded with "module load", so they are not available by default in the environment - but I don't know how to test if this is the correct explanation, nor how to fix it.

Note that some of the paths mentioned in the error messages (e.g. /usr/bin/nvidia-smi, /usr/local/nvidia) don't exist on our system.

Thank you!

Answer 1 · 2020-10-06T14:39:52.000Z

Actually this seems to be a local problem. Sorry for the noise.

Answer 2 · 2020-11-13T20:03:54.000Z

Dear Alberto,
Did you solve the problem with error:
[guppy/error] *Common::LoadModuleFromFatbin: Loading fatbin file shared.fatbin failed with: CUDA error at /builds/ofan/ont_core_cpp/ont_core/common/cuda_common.cpp:54: CUDA_ERROR_NO_BINARY_FOR_GPU

My workstation has the new RTX3080 GPU. I am using docker with nivdia docker toolkit so that I can use CUDA 11, but guppy does not work either. :-(

Thanks for help.

All the best,
Sascha

Answer 3 · 2021-02-23T05:31:42.000Z

Dear Alberto, Sascha, I am having the same issue, Did you find a way to resolve it? I have a RTX3070 GPU. Funny enough, I can GPU basecall with standalone guppy 4.4.1 but not with guppy 4.2.2, which is the version associated with minknow.
Any tricks?
Many Thanks,
Carolina

Answer 4 · 2021-02-23T09:18:42.000Z

Hi @cccorreao1 can you paste me the log of your error? Are you using a version of Guppy that is built for being used on a GPU?

Answer 5 · 2021-02-25T02:51:42.000Z

@lucacozzuto hi!
The issue was that the version of guppy (4.2.2 as per ONT instructions) I was using was not compiled for cuda11, which is the version my GPU is enabled. I got the correct guppy from a colleague and that resolved the issue. Many thanks for following up :)

Answer 6 · 2021-02-25T14:55:37.000Z

You are welcome!