mumax3 raises CUDA_ERROR_UNKNOWN
CasperSchippers opened this issue · 4 comments
Hi,
When I try to run mumax3 (version 3.10, happens both with pre-compiled binaries and when built from source) I get the following error:
Try running: sudo nvidia-modprobe -u
/home/azken/go/src/github.com/mumax/3/cuda/init.go:60 CUDA_ERROR_UNKNOWN
I tried the nvidia-modprobe suggestion, but that doesn't seem to help. I'm working on Arch Linux (Linux version 5.9.1.arch1-1), with the following driver-versions:
nvidia 455.28-7
nvidia-utils 455.28-1
cuda 11.1.0-2
the nvidia-smi
command gives the following output:
Wed Oct 28 11:18:56 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.28 Driver Version: 455.28 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX TITAN Off | 00000000:01:00.0 On | N/A |
| 30% 37C P8 16W / 250W | 38MiB / 6080MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 428 G /usr/lib/Xorg 35MiB |
+-----------------------------------------------------------------------------+
Could you help me?
Thanks in advance!
Regards,
Casper
Hey @CasperSchippers , I ran into the same issue, does it still persist on your setup? If so can you check if following minimal example reproduces issue for you as well?
#include<iostream>
#include<cuda.h>
int main()
{
std::cout << "Hello cuda world\n";
auto Err = cuInit(0);
if (Err != CUDA_SUCCESS)
std::cout << "Got error: " << Err << " while initializing :'(" << std::endl;
else
std::cout << "CUDA init success!" << std::endl;
return 0;
}
compile with nvcc -lcuda
. 999 is unknown error. I got it after upgrading[debian/testing] drivers, cuda and kernel, so if you are in urgency you can try downgrading ;) If you still get it, I think there is good reason to close issue here and open one on distro maintainers website and nvidia support.
Please include dmesg output with your minimal sample run, I just looked and it is pretty informative
[15793.136747] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[15793.137060] nvidia_uvm: Unknown symbol radix_tree_preloads (err -2)
[15793.137100] nvidia_uvm: Unknown symbol set_cpus_allowed_ptr (err -2)
[15793.137148] nvidia_uvm: Unknown symbol mmu_notifier_unregister (err -2)
[15793.137268] nvidia_uvm: Unknown symbol __mmu_notifier_register (err -2)
Edit: found a bug report, hope people having same issue find it helpful:
https://bugs.archlinux.org/task/68312
Hi @Artemkth, thanks for the reply. At the moment, I have downgraded the system because I couldn't get it to work, and people needed it urgently. Whenever the system is available to me again, I will try your suggestion.