YapengTian/TDAN-VSR-CVPR-2020

RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

mehranjeelani opened this issue · 8 comments

I get the following error when I use your trained model to test on vid4 dataset. I was able to compile deformable convolution and have torch version = 0.3.1 and python = 3.6 with cuda = 9.
Kindly help!
Traceback (most recent call last):
File "eval.py", line 117, in
output, _ = model(lr)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 73, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 83, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 67, in parallel_apply
raise output
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 42, in _worker
output = module(*input, **kwargs)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/data2/superresolution/video_sr/TDAN-VSR/model.py", line 225, in forward
out = self.relu(self.conv_first(y))
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 282, in forward
self.padding, self.dilation, self.groups)
File "/data3/conda/envs/mehran/torch031/lib/python3.6/site-packages/torch/nn/functional.py", line 90, in conv2d
return f(input, weight, bias)
RuntimeError: CUDNN_STATUS_EXECUTION_FAILED

Did you change any code and run the given test examples? It seems that the issue is from gpu device parallel. Sorry for the very late response.

Yes, I am changing the code a bit. I am actually testing on my custom dataset. My test directory is just the path to the folder containing all the frames, and I am accordingly changing the code.
Here is my python code for eval.py:

import argparse
import sys
import scipy
import os
from PIL import Image
import torch
os.environ["CUDA_VISIBLE_DEVICES"] = "2"
import numpy as np
from skimage import io, transform
from model import ModelFactory
from torch.autograd import Variable
import time
description='Video Super Resolution pytorch implementation'

def forward_x8(lr, forward_function=None):
        def _transform(v, op):
            v = v.float()

            v2np = v.data.cpu().numpy()
            #print(v2np.shape)
            if op == 'v':
                tfnp = v2np[:, :, :, :, ::-1].copy()
            elif op == 'h':
                tfnp = v2np[:, :, :, ::-1, :].copy()
            elif op == 't':
                tfnp = v2np.transpose((0, 1, 2, 4, 3)).copy()
	
            ret = Variable(torch.Tensor(tfnp).cuda())
            #ret = ret.half()

            return ret

        def _transform_back(v, op):
       		
            if op == 'v':
                tfnp = v[:, :, :, ::-1].copy()
            elif op == 'h':
                tfnp = v[:, :, ::-1, :].copy()
            elif op == 't':
                tfnp = v.transpose((0, 1, 3, 2)).copy()
	
            return tfnp

        
        x = [lr]
        for tf in 'v', 'h': x.extend([_transform(_x, tf) for _x in x])
       
        list_r = []
        for k in range(len(x)):
            z = x[k]
            r, _ = forward_function(z)
            r = r.data.cpu().numpy()
            if k % 4 > 1:
                    r =  _transform_back(r, 'h')
            if (k % 4) % 2 == 1:
                    r =  _transform_back(r, 'v')
            list_r.append(r)
        y = np.sum(list_r,  axis=0)/4.0
       
        y = Variable(torch.Tensor(y).cuda())
        if len(y) == 1: y = y[0]
        return y
def quantize(img, rgb_range):
    return img.mul(255 / rgb_range).clamp(0, 255).round()


parser = argparse.ArgumentParser(description=description)

parser.add_argument('-m', '--model', metavar='M', type=str, default='TDAN',
                    help='network architecture.')
parser.add_argument('-s', '--scale', metavar='S', type=int, default=4, 
                    help='interpolation scale. Default 4')
parser.add_argument('-t', '--test-set', metavar='NAME', type=str, default='../datasets/KLE_1519',
                    help='dataset for testing.')
parser.add_argument('-mp', '--model-path', metavar='MP', type=str, default='model',
                    help='model path.')
parser.add_argument('-sp', '--save-path', metavar='SP', type=str, default='res/KLE_1519_sr',
                    help='saving directory path.')
args = parser.parse_args()

model_factory = ModelFactory()
model = model_factory.create_model(args.model)
dir_LR = args.test_set
#lis = sorted(os.listdir(dir_LR))
model_path = os.path.join(args.model_path, 'model.pt')
if not os.path.exists(model_path):
    raise Exception('Cannot find %s.' %model_path)
model = torch.load(model_path)
model.eval()
path = args.save_path
if not os.path.exists(path):
    os.makedirs(path)

#for i in range(len(lis)):
for i in range(1):
    #print(lis[i])
    LR = dir_LR
    ims = sorted(os.listdir(LR))
    num = len(ims)
    # number of the seq
    num = len(ims)
    image = io.imread(os.path.join(LR, ims[0]))
    row, col, ch = image.shape
    frames_lr = np.zeros((5, int(row), int(col), ch))
    for j in range(num):
        for k in range(j-2, j + 3):
            idx = k-j+2
            if k < 0:
                k = -k
            if k >= num:
                k = num - 3
            frames_lr[idx, :, :, :] = io.imread(os.path.join(LR, ims[k]))
        start = time.time()
        frames_lr = frames_lr/255.0 - 0.5
        lr = torch.from_numpy(frames_lr).float().permute(0, 3, 1, 2)
        lr = Variable(lr.cuda()).unsqueeze(0).contiguous()
        output, _ = model(lr)
        #output = forward_x8(lr, model)
        output = (output.data + 0.5)*255
        output = quantize(output, 255)
        output = output.squeeze(dim=0)
        elapsed_time = time.time() - start
        print(elapsed_time)
        img_name = os.path.join(path,ims[j])
        Image.fromarray(np.around(output.cpu().numpy().transpose(1, 2, 0)).astype(np.uint8)).save(img_name)
        

        

I have the same problem.Did you solve it?

Hi. No, I actually used another model which gave better results. I didn't bother to fix this

Sorry, I missed it. @Jin-97 Do you still have the problem. It is pretty weird to see a parallel issue since only GPU is used.

I reconfigure the dependencies:python=3.6.6,torch=0.3.1, cuda=9.1,and seem to solve the problem.Because a new problem has emerged:RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58.
I just run the test code, my GPU is GTX1080.Where can I change the batchsize?Or do you have any suggestions?

If you are training the model, using a smaller batchsize is a good choice. If you are running testing, I would like to suggest you use the chop_forward function in the solver https://github.com/YapengTian/TDAN-VSR-CVPR-2020/blob/master/solver.py , which split the whole video frames into smaller patches.

Thanks~~~