Upgrading project to Pytorch 1.12.1 + CUDA 11.6 - No CUDA GPUs are available
Closed this issue · 2 comments
Hi!
I've been testing ZITS_inpainting using my CPU very successfully but I have an RTX 3060 on Windows 10 and cannot seem to get it running on my GPU using the original setup that you describe in the 'Preparations' section.
I decided to upgrade the project dependencies as best as I am able to. The settings below worked for me.
conda create -n train_env python=3.10
conda activate train_env
conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge
pip install -r requirement.txt (All set to latest versions)
However when running the single image test using the settings below and GPU_ids='0', I get the following output:
python single_image_test.py --img_path="D:\GANProjects\ZITS_inpainting-main\Tests\Test_Image_1.png" --mask_path="D:\GANProjects\ZITS_inpainting-main\Tests\Test_Image_Mask.png" --save_path="D:\GANProjects\ZITS_inpainting-main\Tests\Results\Test_Image_1.png" --config_file ./ckpt/config.yml --GPU_ids='0'
File "D:\GANProjects\ZITS_inpainting-main\single_image_test.py", line 356, in <module>
model = ZITS(config, 0, 0, True, True)
File "D:\GANProjects\ZITS_inpainting-main\src\FTR_trainer.py", line 256, in __init__
self.inpaint_model = DefaultInpaintingTrainingModule(config, gpu=gpu, rank=rank, test=test, **kwargs).to(gpu)
File "D:\GANProjects\ZITS_inpainting-main\src\models\FTR_model.py", line 424, in __init__
super().__init__(*args, gpu=gpu, name='InpaintingModel', rank=rank, test=test, **kwargs)
File "D:\GANProjects\ZITS_inpainting-main\src\models\FTR_model.py", line 156, in __init__
self.str_encoder = StructureEncoder(config).cuda(gpu)
File "C:\Miniconda3\envs\train_env\lib\site-packages\torch\nn\modules\module.py", line 747, in cuda
return self._apply(lambda t: t.cuda(device))
File "C:\Miniconda3\envs\train_env\lib\site-packages\torch\nn\modules\module.py", line 639, in _apply
module._apply(fn)
File "C:\Miniconda3\envs\train_env\lib\site-packages\torch\nn\modules\module.py", line 639, in _apply
module._apply(fn)
File "C:\Miniconda3\envs\train_env\lib\site-packages\torch\nn\modules\module.py", line 662, in _apply
param_applied = fn(param)
File "C:\Miniconda3\envs\train_env\lib\site-packages\torch\nn\modules\module.py", line 747, in <lambda>
return self._apply(lambda t: t.cuda(device))
File "C:\Miniconda3\envs\train_env\lib\site-packages\torch\cuda\__init__.py", line 227, in _lazy_init
torch._C._cuda_init()
RuntimeError: No CUDA GPUs are available
Pytorch 1.12.1 and CUDA 11.6 are installed and my GPU is visible and recognized inside the train_env environment.
I'm having a difficult time figuring out what exactly the issue is.
As you folks are most familiar with the codebase, could you point me in the right direction in order to help me get this running on my GPU?
Hello, I am not clear about the model running on Windows.
After following the call stack, it seems to stop at line 156 in the FTR_model.py file.
line 156 - self.str_encoder = StructureEncoder(config).cuda(gpu)
So I stepped through every line of code in there with debug statements, and everything printed out fine.
It turns out that to remove the error that I would get when using --GPU_ids='0'
, all I had to do was bypass it by setting os.environ['CUDA_VISIBLE_DEVICES'] = '0'
directly.
Can confirm that the single-image test runs and completes now and after monitoring my GPU's stats it does indeed use the cuda cores and GPU memory.
Thank you for your patience with all my questions! 🙏