CUDA nccl library issue
francomarianardini opened this issue · 0 comments
francomarianardini commented
Hello,
I cloned this repository because I am interested in running the run_inference.sh command. I followed the steps listed in the readme. However, when I run run_inference, I got the following error
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'std::runtime_error'
what(): NCCL error in: /pytorch/torch/lib/c10d/../c10d/NCCLUtils.hpp:155, unhandled cuda error, NCCL version 2.7.8
ncclUnhandledCudaError: Call to CUDA function failed.
My system has NCCL v2.7.8 correctly installed with the corresponding CUDA toolkit.
What am I missing here?
thanks in advance for the help.
best,
Franco Maria