BUG no#1 RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
ashuezy opened this issue · 5 comments
root@1192704b450d:/opt/github/LFD-A-Light-and-Fast-Detector/WIDERFACE_train# python3 predict.py
<class 'lfd.model.lfd.LFD'>
Traceback (most recent call last):
File "predict.py", line 26, in <module>
results = config_dict['model'].predict_for_single_image(image, aug_pipeline=simple_widerface_val_pipeline, classification_threshold=0.5, nms_threshold=0.3)
File "../lfd/model/lfd.py", line 553, in predict_for_single_image
predicted_classification, predicted_regression = self.forward(data_batch)
File "../lfd/model/lfd.py", line 493, in forward
backbone_outputs = self._backbone(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "../lfd/model/backbone/lfd_resnet.py", line 479, in forward
x = self._stem(x)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 396, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
@ashuezy you have to check if pytorch is installed correctly with corresponding CUDNN.
As I can see there is no compatible docker for the following:
CUDA 10.2, CUDNN 8.0.4, TensorRT 7.2.2.3
https://docs.nvidia.com/deeplearning/tensorrt/container-release-notes/rel_21-02.html#rel_21-02
Can you specify which Nvidia docker is compatible with your build?
Here is the docker setup commands.
->pull the docker image
docker pull nvcr.io/nvidia/tensorrt:20.09-py3
->bash inside the docker
docker run -it --gpus all nvcr.io/nvidia/tensorrt:20.09-py3 /bin/bash
->install pytorch
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
->checkout the repo
mkdir /opt/github/
cd /opt/github/
git clone --recursive https://github.com/YonghaoHe/LFD-A-Light-and-Fast-Detector
->see the container id in terminal2
docker container ps
-> copy the OneDrive-2021-03-08.zip and libjpeg-turbo-2.0.5.tar.gz inside the container in terminal2
docker cp OneDrive-2021-03-08.zip <container_id>:/opt/github/LFD-A-Light-and-Fast-Detector
docker cp libjpeg-turbo-2.0.5.tar.gz <container_id>:/opt/
-> extract the zip in terminal1
unzip OneDrive-2021-03-08.zip
-> extract and compile libjpeg-turbo-2.0.5.tar.gz
tar -xvf libjpeg-turbo-2.0.5.tar.gz
cd libjpeg-turbo-2.0.5
mkdir build
cd build
cmake ..
make
cp libturbojpeg.so.0.2.0 /opt/github/LFD-A-Light-and-Fast-Detector/lfd/data_pipeline/dataset/utils/libs/
-> Install the repo
cd /opt/github/LFD-A-Light-and-Fast-Detector/
python setup.py build_ext
pip install opencv-python
apt-get install -y libgl1-mesa-dev
pip install albumentations
pip install pycocotools
cd /opt/github/LFD-A-Light-and-Fast-Detector/WIDERFACE_train
changes in predict.py
@ashuezy where can i find the one drive zip file ?