Cuda: unknow error cuda_get_deviceinfo on line 535
berezinevgeniy opened this issue · 13 comments
i have error message on start on GeeForce 610-620, 720 and xmrig-nvidia 2.13+with drivers 376.71-391.35 (latest for me)
- ABOUT XMRig-NVIDIA/2.14.0 MSVC/2015
- LIBS libuv/1.24.1 CUDA/9.0 OpenSSL/1.1.1a microhttpd/0.9.61
- CPU Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz x64 -AES
- GPU #0 PCI:0000:01:00 GeForce 610M @ 950/900 MHz 32x10 8x25 arch:21 SMX :1
- ALGO cryptonight, donate=5%
- POOL #1 XXXXXXX:8888 variant auto
- COMMANDS hashrate, health, pause, resume
GPU 0: unknown error
cuda_get_deviceinfo line 535
[2019-03-09 00:09:29] Setup failed for GPU 0. Exitting.
xmrig Cuda 9 has the same problem. 2.8.3+ works fine
You should use CUDA 8.0 version, Fermi (compute capability 2.1) architecture not supported by CUDA 9.0.
Thank you.
have the same error on cuda 8 -(
- ABOUT XMRig-NVIDIA/2.14.0 MSVC/2015
- LIBS libuv/1.24.1 CUDA/8.0 OpenSSL/1.1.1a microhttpd/0.9.61
- CPU Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz x64 -AES
- GPU #0 PCI:0000:01:00 GeForce 610M @ 950/900 MHz 32x10 8x25 arch:21 SMX
:1 - ALGO cryptonight, donate=5%
- POOL #1 XXXXXXX:8888 variant auto
- COMMANDS hashrate, health, pause, resume
GPU 0: unknown error
cuda_get_deviceinfo line 535
[2019-03-09 01:22:30] Setup failed for GPU 0. Exitting.
with 2.8.3 cuda 8 have no problems
GF 740M works fine with 2.14. Only GF 6(7,8)10-20M series fails win new xmrig.
How can i find error code from Cuda? I Can not compile new binnary with comment this part of code (cuda_get_deviceinfo). can you help me?
I have the same problem too :((. Plz help.....
@berezinevgeniy Can you provide the config too? currently I have no idea what cause this issue, simply calling cudaGetDeviceProperties
failed with unknown error
, but previous call was successful because GPU name and other information detected correctly.
Thank you.
it seems to unsupported cuda. The latest drivers for me is 391.35. And all of my problem videos are on Fermi, that is not supperted by Nvidia any more. All other videos that i have is on 417+ drivers and has no problem (after 2.14.1 xmrig-nvidia)
P.S. Xmr-stak have the similar error on start with Fermi GeForces after 2.10.0 update. And works with early version as a xmrig.
[2019-03-10 18:36:44] : NVIDIA: try to load library 'xmrstak_cuda_backend_cuda10_0'
WARNING: NVIDIA Insufficient driver!
WARNING: NVIDIA no device found
[2019-03-10 18:36:44] : NVIDIA: try to load library 'xmrstak_cuda_backend_cuda9_2'
WARNING: NVIDIA cannot load backend library: xmrstak_cuda_backend_cuda9_2.dll
WARNING: NVIDIA Insufficient driver!
WARNING: NVIDIA no device found
[2019-03-10 18:36:44] : NVIDIA: try to load library 'xmrstak_cuda_backend'
NVIDIA: found 1 potential device's
[2019-03-10 18:56:01] : Starting NVIDIA GPU thread 0, affinity: 36761328.
WARNING: Invalid device ID '36761376'!
[2019-03-10 18:56:01] : Setup failed for GPU 0. Exiting.
It seems something wrong with device id.
I think, need to get debug info with this error. We have no full description what happend with cudaGetDeviceProperties. Usually unknow error has detail description.
If you like, i can give you TeamViwer acces to pc with Fermi nvidia to test or you can make compile test miner with no double cudaGetDeviceProperties run (use result from previous exec this function if it works as you see) -)
Config.json
{
"algo": "cryptonight",
"api": {
"port": 0,
"access-token": null,
"id": null,
"worker-id": null,
"ipv6": false,
"restricted": true
},
"background": false,
"colors": true,
"cuda-bfactor": 6,
"cuda-bsleep": 25,
"cuda-max-threads": 64,
"donate-level": 5,
"log-file": "c:\xmrig\log.txt",
"pools": [
{
"url": "xxx",
"user": "xxx",
"pass": "x",
"rig-id": null,
"nicehash": false,
"keepalive": false,
"variant": -1,
"tls": false,
"tls-fingerprint": null
}
],
"print-time": 60,
"retries": 5,
"retry-pause": 5,
"threads": [
{
"index": 0,
"threads": 64,
"blocks": 4,
"bfactor": 8,
"bsleep": 25,
"sync_mode": 3,
"affine_to_cpu": false
}
],
"user-agent": null,
"syslog": false,
"watch": false
}
This builds (Cuda8 version download) from xmr-stak is working with Fermi. This is better then nothing, I belive that you help with it. Xmrig is faster, simple, stable.
I modified the failure exit and also meta-miner so it will relaunch until it works. If it doesn't work, use a larger hammer...
Sometimes its like 10 times per success. But it works OK for now. Clocking or not has no difference on the init crash. I did not test underclocking though. When it does run there are no invalids nor kernel crashes so I'm pretty sure the clocking is fine.
Also confirm somehow xmr-stak does not do this at all ever (but its init code is basically identical?)
Only real difference is they load their backend as a DLL, while here it's linked static into the main exe. Maybe chain-loading DLL to DLL just works better than the static launch for some reason?
I never quite understood why xmr-stak refuses to static link (it "should" work "identical") but a weird unexplained side effect like this might be why? Also it's more of a CPU miner with GPU plugins so having a DLL plugin makes sense there (for runtime full disable of the GPU backends) but it seems like there are more reasons than just that.
>>> Starting miner: ./xmrig-nvidia80 --config=config-r.json
* ABOUT XMRig-NVIDIA/2.14.2-dev MSVC/2015
* LIBS libuv/1.23.0 CUDA/8.0 OpenSSL/1.1.1 microhttpd/0.9.59
* CPU Intel(R) Core(TM) i7-3540M CPU @ 3.00GHz x64 AES
* GPU #0 PCI:0000:01:00 NVS 5200M @ 1390/1976 MHz 10x40 6x25 arch:21 SMX:2 MEM:0/5108MiB
* ALGO cryptonight, donate=0%
* POOL #1 127.0.0.1:3334 variant=r
* API BIND [::]:10081
* COMMANDS 'h' hashrate, 'e' health, 'p' pause, 'r' resume
>>> Miner server on 127.0.0.1:3334 port connected from 127.0.0.1
>>> Pool (gulf.moneroocean.stream:ssl443) <-> miner link was established due to new miner connection
GPU 0: unknown error
cuda_get_deviceinfo line 536
[2019-03-18 12:01:30] Setup failed for GPU 0. Exiting.
!!! Miner socket error
!!! Pool (gulf.moneroocean.stream:ssl443) <-> miner link was broken due to miner socket error
!!! Miner './xmrig-nvidia80 --config=config-r.json' exited with nonzero code 1
>>> Restarting './xmrig-nvidia80 --config=config-r.json' miner that was closed unexpectedly
>>> Starting miner: ./xmrig-nvidia80 --config=config-r.json
* ABOUT XMRig-NVIDIA/2.14.2-dev MSVC/2015
* LIBS libuv/1.23.0 CUDA/8.0 OpenSSL/1.1.1 microhttpd/0.9.59
* CPU Intel(R) Core(TM) i7-3540M CPU @ 3.00GHz x64 AES
* GPU #0 PCI:0000:01:00 NVS 5200M @ 1390/1976 MHz 10x40 6x25 arch:21 SMX:2 MEM:0/5116MiB
* ALGO cryptonight, donate=0%
* POOL #1 127.0.0.1:3334 variant=r
* API BIND [::]:10081
* COMMANDS 'h' hashrate, 'e' health, 'p' pause, 'r' resume
>>> Miner server on 127.0.0.1:3334 port connected from 127.0.0.1
>>> Pool (gulf.moneroocean.stream:ssl443) <-> miner link was established due to new miner connection
GPU 0: unknown error
cuda_get_deviceinfo line 536
[2019-03-18 12:01:32] Setup failed for GPU 0. Exiting.
!!! Miner socket error
!!! Pool (gulf.moneroocean.stream:ssl443) <-> miner link was broken due to miner socket error
!!! Miner './xmrig-nvidia80 --config=config-r.json' exited with nonzero code 1
>>> Restarting './xmrig-nvidia80 --config=config-r.json' miner that was closed unexpectedly
>>> Starting miner: ./xmrig-nvidia80 --config=config-r.json
* ABOUT XMRig-NVIDIA/2.14.2-dev MSVC/2015
* LIBS libuv/1.23.0 CUDA/8.0 OpenSSL/1.1.1 microhttpd/0.9.59
* CPU Intel(R) Core(TM) i7-3540M CPU @ 3.00GHz x64 AES
* GPU #0 PCI:0000:01:00 NVS 5200M @ 1390/1976 MHz 10x40 6x25 arch:21 SMX:2 MEM:0/5119MiB
* ALGO cryptonight, donate=0%
* POOL #1 127.0.0.1:3334 variant=r
* API BIND [::]:10081
* COMMANDS 'h' hashrate, 'e' health, 'p' pause, 'r' resume
>>> Miner server on 127.0.0.1:3334 port connected from 127.0.0.1
>>> Pool (gulf.moneroocean.stream:ssl443) <-> miner link was established due to new miner connection
[2019-03-18 12:01:34] use pool 127.0.0.1:3334 127.0.0.1
[2019-03-18 12:01:34] new job from 127.0.0.1:3334 diff 845 algo cn/r height 1793551
[2019-03-18 12:02:01] speed 10s/60s/15m n/a n/a n/a H/s max n/a H/s
[2019-03-18 12:02:01] * GPU #0: 81C FAN 0%
[2019-03-18 12:02:03] accepted (1/0) diff 845 (63 ms)
[2019-03-18 12:02:17] accepted (2/0) diff 845 (69 ms)
[2019-03-18 12:02:25] speed 10s/60s/15m n/a n/a n/a H/s max n/a H/s
[2019-03-18 12:02:25] * GPU #0: 82C FAN 0%
[2019-03-18 12:02:49] speed 10s/60s/15m n/a n/a n/a H/s max n/a H/s
[2019-03-18 12:02:49] * GPU #0: 83C FAN 0%
[2019-03-18 12:03:13] speed 10s/60s/15m n/a 29.3 n/a H/s max n/a H/s
[2019-03-18 12:03:13] * GPU #0: 83C FAN 0%
[2019-03-18 12:03:25] accepted (3/0) diff 845 (61 ms)
[2019-03-18 12:03:37] speed 10s/60s/15m n/a 29.4 n/a H/s max n/a H/s
[2019-03-18 12:03:37] * GPU #0: 83C FAN 0%
[2019-03-18 12:04:01] speed 10s/60s/15m n/a 29.6 n/a H/s max n/a H/s
Also dell laptop thus the no fan reporting (NVidiaInspector just greys that section out / not available)
So that is "normal". It might be nice for it to disappear when unsupported such as the power-usage.
Also cn-heavy only works at like 4x4 which is low occupancy and slow (like 25% of ideal probably)
everything else larger hits memory allocation failures at kernel init (once the init error has been brute forced)
Most algos work nice once hand-tuned. All Fermi autotuning is not even close though on most algos.
They seem to enjoy 5 times SMX (10) threads and then adjust blocks until memory allocation doesn't fail.
(The MEM: part of the detection line is a hack I'm working on obviously doesn't work yet)
PR #255 fixes this cuda_get_deviceinfo init-crash
although nobody really knows why
Comes with the memory reporting too, and support for -DCUDA_ARCH=21
by itself (2% faster on mine vs "20" code)
@berezinevgeniy don't forget to come back sometime and check this thread