RuntimeError: CUDNN_STATUS_EXECUTION_FAILED
mehranjeelani opened this issue · 8 comments
I get the following error when I use your trained model to test on vid4 dataset. I was able to compile deformable convolution and have torch version = 0.3.1 and python = 3.6 with cuda = 9.
Kindly help!
Traceback (most recent call last):
File "eval.py", line 117, in
output, _ = model(lr)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 73, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 83, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply
raise output
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker
output = module(*input, **kwargs)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/data2/superresolution/video_sr/TDAN-VSR/model.py", line 225, in forward
out = self.relu(self.conv_first(y))
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 282, in forward
self.padding, self.dilation, self.groups)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_EXECUTION_FAILED
Did you change any code and run the given test examples? It seems that the issue is from gpu device parallel. Sorry for the very late response.
Yes, I am changing the code a bit. I am actually testing on my custom dataset. My test directory is just the path to the folder containing all the frames, and I am accordingly changing the code.
Here is my python code for eval.py:
import argparse
import sys
import scipy
import os
from PIL import Image
import torch
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
import numpy as np
from skimage import io, transform
from model import ModelFactory
from torch.autograd import Variable
import time
description='Video Super Resolution pytorch implementation'
def forward_x8(lr, forward_function=None):
def _transform(v, op):
v = v.float()
v2np = v.data.cpu().numpy()
#print(v2np.shape)
if op == 'v':
tfnp = v2np[:, :, :, :, ::-1].copy()
elif op == 'h':
tfnp = v2np[:, :, :, ::-1, :].copy()
elif op == 't':
tfnp = v2np.transpose((0, 1, 2, 4, 3)).copy()
ret = Variable(torch.Tensor(tfnp).cuda())
#ret = ret.half()
return ret
def _transform_back(v, op):
if op == 'v':
tfnp = v[:, :, :, ::-1].copy()
elif op == 'h':
tfnp = v[:, :, ::-1, :].copy()
elif op == 't':
tfnp = v.transpose((0, 1, 3, 2)).copy()
return tfnp
x = [lr]
for tf in 'v', 'h': x.extend([_transform(_x, tf) for _x in x])
list_r = []
for k in range(len(x)):
z = x[k]
r, _ = forward_function(z)
r = r.data.cpu().numpy()
if k % 4 > 1:
r = _transform_back(r, 'h')
if (k % 4) % 2 == 1:
r = _transform_back(r, 'v')
list_r.append(r)
y = np.sum(list_r, axis=0)/4.0
y = Variable(torch.Tensor(y).cuda())
if len(y) == 1: y = y[0]
return y
def quantize(img, rgb_range):
return img.mul(255 / rgb_range).clamp(0, 255).round()
parser = argparse.ArgumentParser(description=description)
parser.add_argument('-m', '--model', metavar='M', type=str, default='TDAN',
help='network architecture.')
parser.add_argument('-s', '--scale', metavar='S', type=int, default=4,
help='interpolation scale. Default 4')
parser.add_argument('-t', '--test-set', metavar='NAME', type=str, default='../datasets/KLE_1519',
help='dataset for testing.')
parser.add_argument('-mp', '--model-path', metavar='MP', type=str, default='model',
help='model path.')
parser.add_argument('-sp', '--save-path', metavar='SP', type=str, default='res/KLE_1519_sr',
help='saving directory path.')
args = parser.parse_args()
model_factory = ModelFactory()
model = model_factory.create_model(args.model)
dir_LR = args.test_set
#lis = sorted(os.listdir(dir_LR))
model_path = os.path.join(args.model_path, 'model.pt')
if not os.path.exists(model_path):
raise Exception('Cannot find %s.' %model_path)
model = torch.load(model_path)
model.eval()
path = args.save_path
if not os.path.exists(path):
os.makedirs(path)
#for i in range(len(lis)):
for i in range(1):
#print(lis[i])
LR = dir_LR
ims = sorted(os.listdir(LR))
num = len(ims)
# number of the seq
num = len(ims)
image = io.imread(os.path.join(LR, ims[0]))
row, col, ch = image.shape
frames_lr = np.zeros((5, int(row), int(col), ch))
for j in range(num):
for k in range(j-2, j + 3):
idx = k-j+2
if k < 0:
k = -k
if k >= num:
k = num - 3
frames_lr[idx, :, :, :] = io.imread(os.path.join(LR, ims[k]))
start = time.time()
frames_lr = frames_lr/255.0 - 0.5
lr = torch.from_numpy(frames_lr).float().permute(0, 3, 1, 2)
lr = Variable(lr.cuda()).unsqueeze(0).contiguous()
output, _ = model(lr)
#output = forward_x8(lr, model)
output = (output.data + 0.5)*255
output = quantize(output, 255)
output = output.squeeze(dim=0)
elapsed_time = time.time() - start
print(elapsed_time)
img_name = os.path.join(path,ims[j])
Image.fromarray(np.around(output.cpu().numpy().transpose(1, 2, 0)).astype(np.uint8)).save(img_name)
I have the same problem.Did you solve it?
Hi. No, I actually used another model which gave better results. I didn't bother to fix this
Sorry, I missed it. @Jin-97 Do you still have the problem. It is pretty weird to see a parallel issue since only GPU is used.
I reconfigure the dependencies:python=3.6.6,torch=0.3.1, cuda=9.1,and seem to solve the problem.Because a new problem has emerged:RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58.
I just run the test code, my GPU is GTX1080.Where can I change the batchsize?Or do you have any suggestions?
If you are training the model, using a smaller batchsize is a good choice. If you are running testing, I would like to suggest you use the chop_forward function in the solver https://github.com/YapengTian/TDAN-VSR-CVPR-2020/blob/master/solver.py , which split the whole video frames into smaller patches.
Thanks~~~