Not using GPU in training
Zach3292 opened this issue · 13 comments
Hi, have you set the trainer to cuda in config.json?
CUDA_TRAINING is set to true but CUDA_INFERENCE is set to false
I didn't change the config file, it is still default
Strange, the trainer terminal should be using your GPU when running training steps then. Can you try to open another terminal and run nvidia-smi while training steps are being performed?
Nvidia-smi says that 50% of your GPU memory is used, but I am not sure whether this is from Trackmania or from the trainer terminal. What happens if you close the worker terminal and the game and execute nvidia-smi while the trainer terminal is still performing training steps?
So weird, I would expect pytorch to throw an error if it cannot use CUDA for any reason when CUDA_TRAINING is true
This is when I only use the laptop, I'll try later with my main computer as the server and trainer to see if something similar happens
Even though I don't really understand everything in it, I thought it might give you a clue as to what the problem may be
Yes that is the setting we use for real training. I have never tried CUDA-enabled training locally on my laptop because I don't even have a CUDA-enabled version of pytorch on my laptop, I just use it to run the worker. Still, sounds weird that the worker doesn't saturate your laptop GPU, perhaps the CPU is a huge bottleneck in your setting, IDK
Even though I don't really understand everything in it, I thought it might give you a clue as to what the problem may be
Yup, sounds relevant, perhaps your pytorch installation is not compatible with your CUDA version (11.7 according to nvidia-smi)?
Hi, could you solve/locate the issue?
Closing for inactivity as I cannot reproduce the issue, please reopen if you experience something similar