BUG no#1 RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Question

BUG no#1 RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

ashuezy opened this issue 4 years ago · 5 comments

root@1192704b450d:/opt/github/LFD-A-Light-and-Fast-Detector/WIDERFACE_train# python3 predict.py 
<class 'lfd.model.lfd.LFD'>
Traceback (most recent call last):
  File "predict.py", line 26, in <module>
    results = config_dict['model'].predict_for_single_image(image, aug_pipeline=simple_widerface_val_pipeline, classification_threshold=0.5, nms_threshold=0.3)
  File "../lfd/model/lfd.py", line 553, in predict_for_single_image
    predicted_classification, predicted_regression = self.forward(data_batch)
  File "../lfd/model/lfd.py", line 493, in forward
    backbone_outputs = self._backbone(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "../lfd/model/backbone/lfd_resnet.py", line 479, in forward
    x = self._stem(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 399, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

Answer 1 · 2021-03-09T01:57:57.000Z

@ashuezy you have to check if pytorch is installed correctly with corresponding CUDNN.

Answer 2 · 2021-03-09T08:44:05.000Z

As I can see there is no compatible docker for the following:
CUDA 10.2, CUDNN 8.0.4, TensorRT 7.2.2.3

https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_21-02.html#rel_21-02
Can you specify which Nvidia docker is compatible with your build?

Answer 3 · 2021-03-09T15:10:24.000Z

Here is the docker setup commands.

->pull the docker image
docker pull nvcr.io/nvidia/tensorrt:20.09-py3

->bash inside the docker
docker run -it --gpus all nvcr.io/nvidia/tensorrt:20.09-py3 /bin/bash

->install pytorch
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html

->checkout the repo
mkdir /opt/github/
cd /opt/github/
git clone --recursive https://github.com/YonghaoHe/LFD-A-Light-and-Fast-Detector

->see the container id in terminal2
docker container ps

-> copy the OneDrive-2021-03-08.zip and libjpeg-turbo-2.0.5.tar.gz inside the container in terminal2
docker cp OneDrive-2021-03-08.zip <container_id>:/opt/github/LFD-A-Light-and-Fast-Detector
docker cp libjpeg-turbo-2.0.5.tar.gz <container_id>:/opt/

-> extract the zip in terminal1
unzip OneDrive-2021-03-08.zip

-> extract and compile libjpeg-turbo-2.0.5.tar.gz
tar -xvf libjpeg-turbo-2.0.5.tar.gz
cd libjpeg-turbo-2.0.5
mkdir build
cd build
cmake ..
make
cp libturbojpeg.so.0.2.0 /opt/github/LFD-A-Light-and-Fast-Detector/lfd/data_pipeline/dataset/utils/libs/

-> Install the repo
cd /opt/github/LFD-A-Light-and-Fast-Detector/
python setup.py build_ext

pip install opencv-python
apt-get install -y libgl1-mesa-dev
pip install albumentations
pip install pycocotools

cd /opt/github/LFD-A-Light-and-Fast-Detector/WIDERFACE_train

changes in predict.py

Add this to the top
import sys
sys.path.append('..')
Change to this
from WIDERFACE_LFD_XS import config_dict, prepare_model
Change to this
param_file_path = './../epoch_1000.pth'
Change last 3 lines
cv2.imwrite('output.jpg', image)
#cv2.imshow('im', image)
#cv2.waitKey()
python predict.py

Answer 4 · 2021-03-09T15:50:49.000Z

@ashuezy That's great!

Answer 5 · 2022-09-29T07:49:14.000Z

@ashuezy where can i find the one drive zip file ?