xmrig/xmrig-nvidia

Failure of cuda 8 Xmrig v2.14.3

bycarloz opened this issue · 9 comments

2.14.1 cuda version 8 the hashrate was fine, now pass for version 2.14.3 and the hashrate blinks a lot. An example: I have a hasrate of (200 h/s) and they disappear every second and they do not remain stable and in version 2.14.1 for cuda 8 it is maintained in (200h/s at 200h/s)

I no longer have the same hashrate that I had in version 2.14.1 because? if I just updated for version 2.14.3 I have to configure a new intensity, are they not the same configuration of 2.14.1? in version 2.14.3 my hashrate is much lower.

Which algo(s) do you see this with?

I have always had this issue with slower GPUs. As I understand, they don't finish jobs within the sampling window and miss logging their hashrate stats in time for the shorter cycles. The Max and longest window stat (5m) should still be accurate regardless what the 10s/60s say (or don't) along with the actual effective rate on poolside.

If there has been any missed stat push within the last 10s, or the last 10s of the current 60s, that sum will be broken (blank). Empty time-windows can't be summed and averaged.

So then if your GPU, at your provided tuning, takes over 10s to finish a work-round then it will break stats for being late on reporting. Reduce blocks*threads and eventually the stats will work great but your hashrate might not be max.

I was able to tune mine to show stats properly, and still have nice rates, the actual best blocks and threads were nowhere near using all the VRAM though. Trying to max the memory allocation basically runs the old GPUs out of power to quickly respond (overutilized). I found a point where raising blocks*threads stopped getting faster hashrate and quit there, and then stats came back to normal reporting. It is nowhere near the crash-point of maximum memory allocation, that's for sure.

This is what I'm using for cryptonight-heavy variant tube which gets 40H/s

        {
            "index": 0,
            "threads": 12,
            "blocks": 20,
            "bfactor": 8,
            "bsleep": 25,
            "sync_mode": 3,
            "affine_to_cpu": 1
        }

when you specify which algo/variant I can tell you my config for that algo.

For the most part it is similar to above other than 40 blocks (normal CN is half the size per scratchpad, so double the blocks and keep threads the same).

* GPU #0 PCI:0000:01:00 NVS 5200M @ 1466/2000 MHz 12x20 8x25 arch:21 SMX:2 MEM:967/1024 MiB
I am also overclocked by quite a bit.

Algorithm of Monero - Cryptonight-R. This is happening to me with xmr-stak in the latest version XMR-Stak 2.10.4 and in xmrig v2.14.3 cuda 8. an example: my hashrate was 200 h/s in the two versions xmr-stak 2.10.2 - xmrig 2.14.1. after updating the last miners, my hashrate is less than 100 or 70 hashrate.

another problem every 10 seconds is blinking and in 60 seconds it remains fixed but the 10 seconds in real time remains blinking. download and upload "threads": 12, "blocks": 39 but I still have less hashrate and it is blinking every 10 seconds this started in the last versions of the two miners

I had to go back to the previous versions.

@Spudz76

Oh right, did you have a arch:30 or such, I think those may still have some problems with R (but I only know from seeing mention of that by @psychocrypt somewhere else). I am sure you told me your GPU model over on xmr-stak issues sometime but of course can't recall. Also that may have been patched since then? not sure, take all that as rumor

I also run a 35 (GTX970) but it doesn't seem to have problems although I compile with the exact arch number not just 30 (the release, and default compile, will build just the root-archs divisible by 10 and not the sub-archs). I could try a more generic build with the 30 code and see if it runs funny on this 35, as a cross test. I don't think I've seen any dropouts on that GPU though it should be plenty fast to not hit the timing/choke issue, assuming I don't overload it with threads.

I also compiled for 21 on the above GPU rather than just 20, I do believe it was weird on 20 as far as display of stats also. Better hashrate too as far as I can tell. Some sub-archs don't really change much, but some of them add registers and whatever which can be beneficial (using root-arch would cap the register quantity to the "worst" in the family, etc) The dynamic kernels (random math) for CN-R should always build for the exact arch as they are built on the fly (with NVRTC) but I am not completely sure without checking the code. It may pass thru the original root-arch, rather than letting NVRTC decide based on local runtime detected arch number, as a function of passing the compile options (like large-grid mode etc). And in that case my forced single arch compiles would also force that to be the sub-arch.

xmr-stak would compile for all archs supported by the CUDA version, root and sub, however I think if the root ones are available CUDA doesn't always load the more specific ones - compiling single arch definitely forces it to do nothing but that subarch though which is why I tend to do so. Docs for CUDA etc would say it should act correctly but... docs always say that? Not giving it a chance to choose wrong can only help, if only by reducing what-ifs and hiding bugs

But also none of that matters if your GPU is actually just a root-arch.

nvidia.txt

same configuration that I am using in xmr-stak 2.10.2 and my hashrate is stable. now I leave you the same configuration but using xmr-stak 2.10.4 and it is flashing and with the hashrate low as I mentioned.

the same goes for xmrig v2.14.3 with cuda 8

bfactor should not be lower than 8 on arch 2x, that could be most of it
its undoc with xmr-stak but is in the code here and should throw a warning
it should work ok as 6 on the 50 though (where you have 8 now)
although I have not run a 50 with 20's in a while. I did have issues mixing them in the past but in those cases only one 'family' would should up in CUDA (the 50 would be missing, or the 20's)

also after trying just the bfactor then try this and, sticking to multiples of SMX usually works out well for me, better gpus more multiple, and on arch 50 or better I usually get better out of flipping the larger number over to threads vs blocks (this is somewhat a guess / starting point you should try +- several SMX steps until you run out of memory (crash on init) and then down until it slows up several H/s, to find the peak. 20/21 should run out of processor before you hit anywhere near their memory cap due to bandwidth (slim bus and slower clock than newer models, and CN-R is definitely much harder on processor thus less gain from blocking out all the memory)

"gpu_threads_conf" :
[
  // gpu: GeForce GTX 750 Ti architecture: 50
  //      memory: 1962/2048 MiB
  //      smx: 5
  { "index" : 0,
    "threads" : 60, "blocks" : 15,
    "bfactor" : 6, "bsleep" :  25,
    "affine_to_cpu" : false, "sync_mode" : 3,
    "mem_mode" : 1,
  },
  // gpu: GeForce GTX 460 architecture: 21
  //      memory: 950/1024 MiB
  //      smx: 7
  { "index" : 1,
    "threads" : 14, "blocks" : 21,
    "bfactor" : 8, "bsleep" :  25,
    "affine_to_cpu" : false, "sync_mode" : 3,
    "mem_mode" : 1,
  },
  // gpu: GeForce GTX 570 architecture: 20
  //      memory: 1179/1280 MiB
  //      smx: 15
  { "index" : 2,
    "threads" : 30, "blocks" : 45,
    "bfactor" : 8, "bsleep" :  25,
    "affine_to_cpu" : false, "sync_mode" : 3,
    "mem_mode" : 1,
  },

],

Architecture: 50 diminished (bfactor: 6) I had a 17% increase in hashrate for me, it was great, now Architecture 20/21 increased (bfactor: 8) I had a 10% hashrate loss with (bfactor: 6) returns my current hashrate

mining where I did the tests xmr-stak 2.10.2 and xmrig 2.14.1 cuda 8

The last versions of the two miners continue in the same problem, they are not stable and I have a low hashrate between 100 or 70 hashrate and in their previous versions I get more than 100 or 200 hashrate higher

@Spudz76 A question that driver version to use for cuda 8?.

error

xmrig 2.14.1 cuda 8

Table for driver vs CUDA versions is here

So, >= 376.51 but also < 385.54 would be CUDA8.0GA2 and matches that toolkit.