Upgrading project to Pytorch 1.12.1 + CUDA 11.6 - No CUDA GPUs are available

Question

Upgrading project to Pytorch 1.12.1 + CUDA 11.6 - No CUDA GPUs are available

Closed this issue 2 years ago · 2 comments

Hi!

I've been testing ZITS_inpainting using my CPU very successfully but I have an RTX 3060 on Windows 10 and cannot seem to get it running on my GPU using the original setup that you describe in the 'Preparations' section.

I decided to upgrade the project dependencies as best as I am able to. The settings below worked for me.

conda create -n train_env python=3.10
conda activate train_env

conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirement.txt (All set to latest versions)

However when running the single image test using the settings below and GPU_ids='0', I get the following output:

python single_image_test.py --img_path="D:\GANProjects\ZITS_inpainting-main\Tests\Test_Image_1.png" --mask_path="D:\GANProjects\ZITS_inpainting-main\Tests\Test_Image_Mask.png" --save_path="D:\GANProjects\ZITS_inpainting-main\Tests\Results\Test_Image_1.png" --config_file ./ckpt/config.yml --GPU_ids='0'

File "D:\GANProjects\ZITS_inpainting-main\single_image_test.py", line 356, in <module>
    model = ZITS(config, 0, 0, True, True)
  File "D:\GANProjects\ZITS_inpainting-main\src\FTR_trainer.py", line 256, in __init__
    self.inpaint_model = DefaultInpaintingTrainingModule(config, gpu=gpu, rank=rank, test=test, **kwargs).to(gpu)
  File "D:\GANProjects\ZITS_inpainting-main\src\models\FTR_model.py", line 424, in __init__
    super().__init__(*args, gpu=gpu, name='InpaintingModel', rank=rank, test=test, **kwargs)
  File "D:\GANProjects\ZITS_inpainting-main\src\models\FTR_model.py", line 156, in __init__
    self.str_encoder = StructureEncoder(config).cuda(gpu)
  File "C:\Miniconda3\envs\train_env\lib\site-packages\torch\nn\modules\module.py", line 747, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "C:\Miniconda3\envs\train_env\lib\site-packages\torch\nn\modules\module.py", line 639, in _apply
    module._apply(fn)
  File "C:\Miniconda3\envs\train_env\lib\site-packages\torch\nn\modules\module.py", line 639, in _apply
    module._apply(fn)
  File "C:\Miniconda3\envs\train_env\lib\site-packages\torch\nn\modules\module.py", line 662, in _apply
    param_applied = fn(param)
  File "C:\Miniconda3\envs\train_env\lib\site-packages\torch\nn\modules\module.py", line 747, in <lambda>
    return self._apply(lambda t: t.cuda(device))
  File "C:\Miniconda3\envs\train_env\lib\site-packages\torch\cuda\__init__.py", line 227, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available

Pytorch 1.12.1 and CUDA 11.6 are installed and my GPU is visible and recognized inside the train_env environment.
I'm having a difficult time figuring out what exactly the issue is.

As you folks are most familiar with the codebase, could you point me in the right direction in order to help me get this running on my GPU?

Answer 1 · 2022-10-08T04:26:36.000Z

Hello, I am not clear about the model running on Windows.

Answer 2 · 2022-10-08T06:31:53.000Z

After following the call stack, it seems to stop at line 156 in the FTR_model.py file.
line 156 - self.str_encoder = StructureEncoder(config).cuda(gpu)
So I stepped through every line of code in there with debug statements, and everything printed out fine.

It turns out that to remove the error that I would get when using --GPU_ids='0' , all I had to do was bypass it by setting os.environ['CUDA_VISIBLE_DEVICES'] = '0' directly.

Can confirm that the single-image test runs and completes now and after monitoring my GPU's stats it does indeed use the cuda cores and GPU memory.

Thank you for your patience with all my questions! 🙏