RTX cards require minimum Pytorch 1.0 [CUDNN_STATUS_EXECUTION_FAILED]
Opened this issue · 14 comments
On my Linux mint 19.1 using an RTX 2070
When trying to recognize using the default installation:
(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ python P2PaLA.py --config config_ALAR_min_model_17_12_18.txt --prev_model ALAR_min_model_17_12_18.pth --prod_data ./images/
2019-01-21 13:42:19,280 - optparse - INFO - Reading configuration from config_ALAR_min_model_17_12_18.txt
2019-01-21 13:42:19,282 - P2PaLA - INFO - Working on prod inference...
2019-01-21 13:42:19,283 - P2PaLA - INFO - Results will be saved to ./work/results/prod
2019-01-21 13:42:19,599 - P2PaLA - INFO - Resumming from model ALAR_min_model_17_12_18.pth
/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/cuda/__init__.py:95: UserWarning:
Found GPU0 GeForce RTX 2070 which requires CUDA_VERSION >= 9000 for
optimal performance and fast startup time, but your PyTorch was compiled
with CUDA_VERSION 8000. Please install the correct PyTorch binary
using instructions from http://pytorch.org
warnings.warn(incorrect_binary_warn % (d, name, 9000, CUDA_VERSION))
So I installed latest torch and torchvision:
(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ pip install --ignore-installed torch torchvision
Then ran recognition:
(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ python P2PaLA.py --config config_ALAR_min_model_17_12_18.txt --prev_model ALAR_min_model_17_12_18.pth --prod_data ./images/
/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
2019-01-21 13:58:31,771 - optparse - INFO - Reading configuration from config_ALAR_min_model_17_12_18.txt
2019-01-21 13:58:31,773 - P2PaLA - INFO - Working on prod inference...
2019-01-21 13:58:31,774 - P2PaLA - INFO - Results will be saved to ./work/results/prod
2019-01-21 13:58:32,125 - P2PaLA - INFO - Resumming from model ALAR_min_model_17_12_18.pth
2019-01-21 13:58:34,859 - P2PaLA - INFO - Preprocessing data from ./images/
P2PaLA.py:1195: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
pr_x = Variable(sample["image"], volatile=True)
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
2019-01-21 13:58:35,463 - P2PaLA - INFO - Production stage done. total time taken: 0.604010820388794
2019-01-21 13:58:35,463 - P2PaLA - INFO - Average time per page: 0.604010820388794
2019-01-21 13:58:35,463 - P2PaLA - INFO - All Done...
Now the problem is when trying to train
(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ python P2PaLA.py --config config_BL_only.txt --tr_data ./data/train --te_data ./data/test --log_comment "_foo"
/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/_reduction.py:49: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
warnings.warn(warning.format(ret))
2019-01-21 14:06:09,788 - optparse - INFO - Reading configuration from config_BL_only.txt
2019-01-21 14:06:09,789 - optparse - DEBUG - Creating output dir: ./work_BL_only
2019-01-21 14:06:09,790 - optparse - DEBUG - Creating checkpoints dir: ./work_BL_only/checkpoints
2019-01-21 14:06:09,790 - P2PaLA - INFO - Working on training stage...
2019-01-21 14:06:09,791 - P2PaLA - WARNING - tensorboardX is not installed, display logger set to OFF.
2019-01-21 14:06:09,791 - P2PaLA - INFO - Preprocessing data from ./data/train
/home/home/Desktop/programs/P2PaLA/nn_models/models.py:293: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
init.uniform(m.weight.data, 0.0, 0.02)
/home/home/Desktop/programs/P2PaLA/nn_models/models.py:298: UserWarning: nn.init.uniform is now deprecated in favor of nn.init.uniform_.
init.uniform(m.weight.data, 1.0, 0.02)
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
Traceback (most recent call last):
File "P2PaLA.py", line 1262, in <module>
main()
File "P2PaLA.py", line 606, in main
epoch_lossD += d_loss.data[0]
IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number
Hi,
There is a major change on pytorch from v0.3* to v0.4*, I'm migrating the code to support those changes. In the meanwhile I recommend to keep pytorch0.3.1.
Your GPU needs cuda >9.0, so please install pytorch 0.3.1 with cuda 9.1 using:
pip uninstall torch torchvision
pip install https://download.pytorch.org/whl/cu91/torch-0.3.1-cp36-cp36m-linux_x86_64.whl
More info about previous pytorch version on pytorch page
pip uninstall torch torchvision
pip install https://download.pytorch.org/whl/cu91/torch-0.3.1-cp36-cp36m-linux_x86_64.whl
(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ python P2PaLA.py --config config_BL_only.txt --tr_data ./data/train --te_data ./data/test --log_comment "_foo"
2019-01-21 15:37:56,527 - optparse - INFO - Reading configuration from config_BL_only.txt
2019-01-21 15:37:56,529 - P2PaLA - INFO - Working on training stage...
2019-01-21 15:37:56,529 - P2PaLA - WARNING - tensorboardX is not installed, display logger set to OFF.
2019-01-21 15:37:56,529 - P2PaLA - INFO - Preprocessing data from ./data/train
Traceback (most recent call last):
File "P2PaLA.py", line 1262, in <module>
main()
File "P2PaLA.py", line 528, in main
y_gen = nnG(x)
File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/home/Desktop/programs/P2PaLA/nn_models/models.py", line 94, in forward
return self.model(input_x)
File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/home/Desktop/programs/P2PaLA/nn_models/models.py", line 184, in forward
return F.log_softmax(self.model(input_x), dim=1)
File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward
input = module(input)
File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 282, in forward
self.padding, self.dilation, self.groups)
File "/home/home/.conda/envs/p3p/lib/python3.6/site-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_EXECUTION_FAILED
I don't think the issue is related to your Ubuntu version. But you need to install the right combination of cuda and pytorch for sure.
If you have installed cuda 9.1 and python 3.6 the command I post before should work, but If you have another combination, like cuda 9.0 or python 2.7 you need to find the right pythorch for it (on pytorch web).
I just test it using python 3.5, cuda9.1 on a GTX 1080 and a TITAN X and it works (I don't have a RTX to test it)
Same error, even after installing Cuda 9.1
(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 410.48 Thu Sep 6 06:36:33 CDT 2018
GCC version: gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)
(p3p) home@home-lnx:~/Desktop/programs/P2PaLA$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85
hmmmm....
it seems that RTX cards don't support Cuda 9.1, that's weird.
Will you consider supporting Cuda 10 via Pytorch 1?
I'm migrating the code to support those changes.
Yes, my goal is to migrate all the code to the latest version of pytorch, but now i'm a bit short of time and I don't think I will release a new version in the following couple of weeks.
Thanks for spotting out the issue with new GPU's. I will try to migrate the code as soon as posible.
In the meanwhile, you can use the tool for inference using the pre-trained model available on CPU (just add the option --gpu -1).
Hoping that you support Cuda 10, Thank you
home@home-lnx:~/NVIDIA_CUDA-9.1_Samples/1_Utilities/deviceQuery$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce RTX 2070"
CUDA Driver Version / Runtime Version 10.0 / 9.1
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 7951 MBytes (8337227776 bytes)
MapSMtoCores for SM 7.5 is undefined. Default to use 64 Cores/SM
MapSMtoCores for SM 7.5 is undefined. Default to use 64 Cores/SM
(36) Multiprocessors, ( 64) CUDA Cores/MP: 2304 CUDA Cores
GPU Max Clock rate: 1815 MHz (1.81 GHz)
Memory Clock rate: 7001 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 46 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.0, CUDA Runtime Version = 9.1, NumDevs = 1
Result = PASS
@lquirosd Can you share this information, which versions are you using:
- Cudnn
- Cuda
- Pytorch
- Compute
Yes, the software have been tested on several configurations
Cudnn: 5,6,7
Cuda: 8,9
Pytorch: 0.3*
Python 2.7, 3.5, 3.6
Os: for training: Ubuntu 16.04, for test: Ubuntu 16.04, Mac OS 10.13
My current set-up is
>>> sys.version
'3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) \n[GCC 7.3.0]'
>>> torch.__version__
'0.3.1'
>>> torch.version.cuda
'8.0.61'
>>> torch.backends.cudnn.version()
7005:
It seems that the problem is caused because RTX cards only support versions of Cuda 10+ and having compute capability 7.5, which the Nvidia forums confirmed to me.
@lquirosd Will you consider upgrading to Pytorch 1.0 ?
Note: CUDA 10 support for compute capability 3.0 – 7.5 (Kepler, Maxwell, Pascal, Volta, Turing)
Hi,
Did you change the "batch_size" parameter to fit your card? I mean, default is 8 images per mini-batch, but RTX 2070 memory is only 8GB. I think it'll support a max mini-batch of 4 images or so.
Can you please run a experiment using a small mini-batch?
This is not a memory issue, RTX cards (Turing) initial support is at Cuda 10, Pytorch 1.0 supports Cuda 10 / 9 / 8 versions.
So the only solution is by upgrading the code to Pytorch 1.0
I just release a new branch for Pytorch 1.0:
git clone --single-branch --branch PyTorch-v1.0 https://github.com/lquirosd/P2PaLA.git
Please notice this branch is not fully tested, so some bugs can be around.
I ran some test on Pytorch: 1.0.0, CUDA: 9.0 and cudnn:7401, but cuda 10 is untested
May you find peace in your life.
Thank you