Not using GPU in training

Question

Not using GPU in training

Zach3292 opened this issue 2 years ago · 13 comments

As mentionned in the title, the program doesn't seem to use my NVIDIA GPU (3050 ti) to train. Instead, CPU usage jumps to 100%

Answer 1 · 2022-07-22T16:51:15.000Z

Hi, have you set the trainer to cuda in config.json?

Answer 2 · 2022-07-22T18:27:15.000Z

CUDA_TRAINING is set to true but CUDA_INFERENCE is set to false

I didn't change the config file, it is still default

Answer 3 · 2022-07-22T20:59:15.000Z

Strange, the trainer terminal should be using your GPU when running training steps then. Can you try to open another terminal and run nvidia-smi while training steps are being performed?

Answer 4 · 2022-07-22T21:21:08.000Z

So this is what i got from nvidia-smi while running the training, still no gpu usage in task manager

Answer 5 · 2022-07-22T21:36:37.000Z

Nvidia-smi says that 50% of your GPU memory is used, but I am not sure whether this is from Trackmania or from the trainer terminal. What happens if you close the worker terminal and the game and execute nvidia-smi while the trainer terminal is still performing training steps?

Answer 6 · 2022-07-22T21:41:07.000Z

So when closing the game and the worker, the trainer still uses the CPU at 95%+ but the vram usage in nvidia-smi dropped to 1% so it's only the game using it I believe

Answer 7 · 2022-07-22T21:52:39.000Z

So weird, I would expect pytorch to throw an error if it cannot use CUDA for any reason when CUDA_TRAINING is true

Answer 8 · 2022-07-22T21:56:58.000Z

This is when I only use the laptop, I'll try later with my main computer as the server and trainer to see if something similar happens

Answer 9 · 2022-07-22T22:00:39.000Z

I found this https://discuss.pytorch.org/t/nvidia-geforce-rtx-3050-ti-laptop-gpu-with-cuda-capability-sm-86-is-not-compatible-with-the-current-pytorch-installation/143837

Even though I don't really understand everything in it, I thought it might give you a clue as to what the problem may be

Answer 10 · 2022-07-22T22:01:59.000Z

Yes that is the setting we use for real training. I have never tried CUDA-enabled training locally on my laptop because I don't even have a CUDA-enabled version of pytorch on my laptop, I just use it to run the worker. Still, sounds weird that the worker doesn't saturate your laptop GPU, perhaps the CPU is a huge bottleneck in your setting, IDK

Answer 11 · 2022-07-22T22:06:26.000Z

I found this https://discuss.pytorch.org/t/nvidia-geforce-rtx-3050-ti-laptop-gpu-with-cuda-capability-sm-86-is-not-compatible-with-the-current-pytorch-installation/143837

Even though I don't really understand everything in it, I thought it might give you a clue as to what the problem may be

Yup, sounds relevant, perhaps your pytorch installation is not compatible with your CUDA version (11.7 according to nvidia-smi)?

Answer 12 · 2022-08-09T15:50:09.000Z

Hi, could you solve/locate the issue?

Answer 13 · 2022-09-12T15:17:37.000Z

Closing for inactivity as I cannot reproduce the issue, please reopen if you experience something similar