[Bug] Setting GPU index or enforce_gpu_index causes unexpected behavior
LeroyINC opened this issue · 7 comments
What happened?
Have a system with 3 different GPU card in it of all different models. When i set a few options in the config file i get some unexpected behavior.
if i set the following
enforce_gpu_index: true
things just start to act strange.. wallet goes out of sync every 10 seconds and resyncs.. Chia GUI keeps going blank and reconnecting. Number of plots does not display properly. and so on.. there is also a bunch of disconnect entries in the debug.log file for the harvester dropping connection to the farmer and such. allot seems to break.
also.. if i set the following
gpu_index: 1 (or to anything other than 0)
things seem to work fine... but on the default GPU there seems to be an artifact process that is loaded but never seems to do anything. (see screenshot)
I am running on Ubuntu 24.04 with all roles on the same machine.
Version
2.4.3
What platform are you using?
Linux
What ui mode are you using?
GUI
Relevant log output
No response
Hey @LeroyINC , This is a known issue where the index used by bladebit (greenreaper) is a different index than is listed in the nvidia-smi report.
Unfortunately since the majority of our lead bladebit developers time is focused on the new plot format there are no current plans to update the codebase and to determine which GPU index needs to be used in the chia software one must use trial and error (set an index, see which GPU runs the process and note that on your side. Repeat for all GPU indexes to ensure the correct ones are being used)
Also for the ghost process issue I recommend rebooting the machine to clear them.
i know about the Bladebit plotting issue and that is not a big deal.. it still works
but my original post is not a plotting issue... but a farming issue with the Chia software. The strange behavior happens when farming. Setting the value enforce_gpu_index: true - make the machine not able to even farm.
Hey @LeroyINC , the gpu index mismatch effects both plotting and farming so if the index is not set properly then the enforce option will not work properly either.
Can you try cycling through the gpu indexes fully stopping / verifying all processes are stopped / then starting chia during each test of the index?
ok did some testing..
when i stop the harvester using "chia stop harvester" then the process stops and gets cleared off all GPU's
when i start the harvester using any GPU index that's available other than 0 then a phantom process always starts on GPU 0 -- see above screen shot.
when i stop the harvester then all process go away including the phantom one on GPU 0
Hey @LeroyINC , thank you for the troubleshooting and information.
Would you be able to provide some more verbose logging for the issues on the harvester?
Note - if you are setting logs to debug mode with chia configure --log-level DEBUG
we first recommend removing your passphrase as the passphrase is printing in plain text in one of the debug logs
Thank you for the information and additional logs!
This issue has not been updated in 14 days and is now flagged as stale. If this issue is still affecting you and in need of further review, please comment on it with an update to keep it from auto closing in 7 days.
This issue was automatically closed because it has been flagged as stale, and subsequently passed 7 days with no further activity from the submitter or watchers.