sony/nnabla

import nnabla_ext.cuda - ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

Closed this issue · 35 comments

Forum,

I am getting the following error when checking my nnabla set up:

#check nnabla
import nnabla
import nnabla_ext.cuda
import nnabla.ext_utils as nneu
import nnabla_ext.cudnn

Error:
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

I am using the pip install to set up the framework. I do have anaconda installed also.

Bests,
Philip

I think this error is not caused by anaconda.

If you installed cuda11.0 correctly, ldconfig -p |grep libcudart will provide the location of libcudart.so.11.0.
If this command prints nothing, please refer https://nnabla.org/install/ to install cuda.
If it prints a different version like cuda10.2, please uninstall nnabla and nnabla-ext-cuda*, and reinstall nnabla and nnabla-ext-cuda102 by pip (if you use 1GPU)

If you cannot resolve, please provide the results or pip list|grep nnabla and dpkg -l|grep -e nvidia -e cuda.

Tomonobu-san,

The issue is nnabla only excepts the 10.2 and 110 drivers. These are old and our production is on a newer Nvidia GPU that requires newer drivers 11.5, 11.6.

Are there downloads for any other Nvidia versions?

The libcuda**.so are not 11.0 or 10.2 so it just crashes. For example, libcudart.so.11.6

We spent a year developing our models on this framework and now we are unable to put into production.

Bests,
-philip

We are using a RTX 3090 in production. FYI.

Maybe I am wrong about the driver. But it is reinstalled per Nvidia. It can't find libcusolver.so.10 since libcusolver.so.11 is installed. nnabla was installed with:

pip3 install nnabla nnabla_ext_cuda110 nnabla-converter

Error:
(base) ubuntu@gc-pensive-goldberg:$ python -c "import nnabla_ext.cuda, nnabla_ext.cudnn"
2022-05-06 18:07:13,529 [nnabla][INFO]: Initializing CPU extension...
Traceback (most recent call last):
File "", line 1, in
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/nnabla_ext/cuda/init.py", line 131, in
load_shared_from_error(err)
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/nnabla_ext/cuda/init.py", line 67, in load_shared_from_error
raise err
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/nnabla_ext/cuda/init.py", line 122, in
from .init import (
ImportError: libcusolver.so.10: cannot open shared object file: No such file or directory
(base) ubuntu@gc-pensive-goldberg:
$ ldconfig -p |grep libcusolver.so
libcusolver.so.11 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so.11
libcusolver.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcusolver.so

(base) ubuntu@gc-pensive-goldberg:~$ pip list|grep nnabla
nnabla 1.28.0
nnabla-converter 1.28.0
nnabla-ext-cuda110 1.28.0

dpkg -l|grep -e nvidia -e cuda

(base) ubuntu@gc-pensive-goldberg:$ dpkg -l|grep -e nvidia -e cuda
ii cuda 11.6.2-1 amd64 CUDA meta-package
ii cuda-11-6 11.6.2-1 amd64 CUDA 11.6 meta-package
ii cuda-cccl-11-6 11.6.55-1 amd64 CUDA CCCL
ii cuda-command-line-tools-11-6 11.6.2-1 amd64 CUDA command-line tools
ii cuda-compiler-11-6 11.6.2-1 amd64 CUDA compiler
ii cuda-cudart-11-6 11.6.55-1 amd64 CUDA Runtime native Libraries
ii cuda-cudart-dev-11-6 11.6.55-1 amd64 CUDA Runtime native dev links, headers
ii cuda-cuobjdump-11-6 11.6.124-1 amd64 CUDA cuobjdump
ii cuda-cupti-11-6 11.6.124-1 amd64 CUDA profiling tools runtime libs.
ii cuda-cupti-dev-11-6 11.6.124-1 amd64 CUDA profiling tools interface.
ii cuda-cuxxfilt-11-6 11.6.124-1 amd64 CUDA cuxxfilt
ii cuda-demo-suite-11-6 11.6.55-1 amd64 Demo suite for CUDA
ii cuda-documentation-11-6 11.6.124-1 amd64 CUDA documentation
ii cuda-driver-dev-11-6 11.6.55-1 amd64 CUDA Driver native dev stub library
ii cuda-drivers 510.47.03-1 amd64 CUDA Driver meta-package, branch-agnostic
ii cuda-drivers-510 510.47.03-1 amd64 CUDA Driver meta-package, branch-specific
ii cuda-gdb-11-6 11.6.124-1 amd64 CUDA-GDB
ii cuda-libraries-11-6 11.6.2-1 amd64 CUDA Libraries 11.6 meta-package
ii cuda-libraries-dev-11-6 11.6.2-1 amd64 CUDA Libraries 11.6 development meta-package
ii cuda-memcheck-11-6 11.6.124-1 amd64 CUDA-MEMCHECK
ii cuda-nsight-11-6 11.6.124-1 amd64 CUDA nsight
ii cuda-nsight-compute-11-6 11.6.2-1 amd64 NVIDIA Nsight Compute
ii cuda-nsight-systems-11-6 11.6.2-1 amd64 NVIDIA Nsight Systems
ii cuda-nvcc-11-6 11.6.124-1 amd64 CUDA nvcc
ii cuda-nvdisasm-11-6 11.6.124-1 amd64 CUDA disassembler
ii cuda-nvml-dev-11-6 11.6.55-1 amd64 NVML native dev links, headers
ii cuda-nvprof-11-6 11.6.124-1 amd64 CUDA Profiler tools
ii cuda-nvprune-11-6 11.6.124-1 amd64 CUDA nvprune
ii cuda-nvrtc-11-6 11.6.124-1 amd64 NVRTC native runtime libraries
ii cuda-nvrtc-dev-11-6 11.6.124-1 amd64 NVRTC native dev links, headers
ii cuda-nvtx-11-6 11.6.124-1 amd64 NVIDIA Tools Extension
ii cuda-nvvp-11-6 11.6.124-1 amd64 CUDA Profiler tools
ii cuda-repo-ubuntu2004-11-0-local 11.0.3-450.51.06-1 amd64 cuda repository configuration files
ii cuda-runtime-11-6 11.6.2-1 amd64 CUDA Runtime 11.6 meta-package
ii cuda-samples-11-6 11.6.101-1 amd64 CUDA example applications
ii cuda-sanitizer-11-6 11.6.124-1 amd64 CUDA Sanitizer
ii cuda-toolkit-11-6 11.6.2-1 amd64 CUDA Toolkit 11.6 meta-package
ii cuda-toolkit-11-6-config-common 11.6.55-1 all Common config package for CUDA Toolkit 11.6.
ii cuda-toolkit-11-config-common 11.6.55-1 all Common config package for CUDA Toolkit 11.
ii cuda-toolkit-config-common 11.6.55-1 all Common config package for CUDA Toolkit.
ii cuda-tools-11-6 11.6.2-1 amd64 CUDA Tools meta-package
ii cuda-visual-tools-11-6 11.6.2-1 amd64 CUDA visual tools
ii libnvidia-cfg1-510:amd64 510.47.03-0ubuntu1 amd64 NVIDIA binary OpenGL/GLX configuration library
ii libnvidia-common-510 510.47.03-0ubuntu1 all Shared files used by the NVIDIA libraries
ii libnvidia-compute-510:amd64 510.47.03-0ubuntu1 amd64 NVIDIA libcompute package
ii libnvidia-compute-510:i386 510.47.03-0ubuntu1 i386 NVIDIA libcompute package
ii libnvidia-decode-510:amd64 510.47.03-0ubuntu1 amd64 NVIDIA Video Decoding runtime libraries
ii libnvidia-decode-510:i386 510.47.03-0ubuntu1 i386 NVIDIA Video Decoding runtime libraries
ii libnvidia-encode-510:amd64 510.47.03-0ubuntu1 amd64 NVENC Video Encoding runtime library
ii libnvidia-encode-510:i386 510.47.03-0ubuntu1 i386 NVENC Video Encoding runtime library
ii libnvidia-extra-510:amd64 510.47.03-0ubuntu1 amd64 Extra libraries for the NVIDIA driver
ii libnvidia-fbc1-510:amd64 510.47.03-0ubuntu1 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-fbc1-510:i386 510.47.03-0ubuntu1 i386 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-gl-510:amd64 510.47.03-0ubuntu1 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii libnvidia-gl-510:i386 510.47.03-0ubuntu1 i386 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii nvidia-compute-utils-510 510.47.03-0ubuntu1 amd64 NVIDIA compute utilities
ii nvidia-dkms-510 510.47.03-0ubuntu1 amd64 NVIDIA DKMS package
ii nvidia-driver-510 510.47.03-0ubuntu1 amd64 NVIDIA driver metapackage
ii nvidia-fs 2.11.0-1 amd64 NVIDIA filesystem for GPUDirect Storage
ii nvidia-fs-dkms 2.11.0-1 amd64 NVIDIA filesystem DKMS package
ii nvidia-gds 11.6.2-1 amd64 GPU Direct Storage meta-package
ii nvidia-gds-11-6 11.6.2-1 amd64 GPU Direct Storage 11.6 meta-package
ii nvidia-kernel-common-510 510.47.03-0ubuntu1 amd64 Shared files used with the kernel module
ii nvidia-kernel-source-510 510.47.03-0ubuntu1 amd64 NVIDIA kernel source package
ii nvidia-modprobe 510.47.03-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files
ii nvidia-prime 0.8.16
0.20.04.2 all Tools to enable NVIDIA's Prime
ii nvidia-settings 510.47.03-0ubuntu1 amd64 Tool for configuring the NVIDIA graphics driver
ii nvidia-utils-510 510.47.03-0ubuntu1 amd64 NVIDIA driver support binaries
ii screen-resolution-extra 0.18build1 all Extension for the nvidia-settings control panel
ii xserver-xorg-video-nvidia-510 510.47.03-0ubuntu1 amd64 NVIDIA binary Xorg driver

Thank you for trying and detail information.
I created cuda11.6/cudnn8.3 environment (without conda), and I got the same error message as you faced:

ImportError: libcusolver.so.10: cannot open shared object file: No such file or directory

cuda has compatibility, so I think cuda11.0 can run on nvidia GPU driver that supports cuda11.6.
So, could you please uninstall cuda11.6 runtime environment, and install cuda11.0 runtime environment if possible?
I think you can use libnvidia and nvidia driver as it is currently installed.

In addition, we will investigate new cuda version support, but it will take some time.

I have a RTX 3070 running nnabla + ext 11.0. I use a docker from the nvcnet example and this is working great.
The Nvidia cuda is version 11.4.

If I try pip installing the Nvidia 3090 which has the exact same driver version and cuda versions as a working machine I get an error when running nnabla which is looking for the incorrect cuda version:

ImportError: libcusolver.so.10: cannot open shared object file: No such file or directory

Why is nnabla trying to look for a version 10???

Install:

pip install nnabla-ext-cuda110
Requirement already satisfied: nnabla-ext-cuda110 in /home/ubuntu/anaconda3/lib/python3.8/site-packages (1.28.0)
WARNING: Keyring is skipped due to an exception: Failed to create the collection: Prompt dismissed..
Collecting nnabla==1.28.0
Using cached nnabla-1.28.0-cp38-cp38-manylinux_2_17_x86_64.whl (19.0 MB)
Requirement already satisfied: setuptools in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from nnabla-ext-cuda110) (52.0.0.post20210125)
Requirement already satisfied: six in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (1.15.0)
Requirement already satisfied: h5py in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (2.10.0)
Requirement already satisfied: pillow in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (8.2.0)
Requirement already satisfied: tqdm in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (4.59.0)
Requirement already satisfied: pyyaml in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (5.4.1)
Requirement already satisfied: protobuf>=3.6 in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (3.20.1)
Requirement already satisfied: scipy in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (1.6.2)
Requirement already satisfied: imageio in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (2.9.0)
Requirement already satisfied: boto3 in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (1.22.9)
Requirement already satisfied: numpy in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (1.20.1)
Requirement already satisfied: configparser in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (5.2.0)
Requirement already satisfied: contextlib2 in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (0.6.0.post1)
Requirement already satisfied: Cython in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (0.29.23)
Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from boto3->nnabla==1.28.0->nnabla-ext-cuda110) (0.5.2)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from boto3->nnabla==1.28.0->nnabla-ext-cuda110) (1.0.0)
Requirement already satisfied: botocore<1.26.0,>=1.25.9 in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from boto3->nnabla==1.28.0->nnabla-ext-cuda110) (1.25.9)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from botocore<1.26.0,>=1.25.9->boto3->nnabla==1.28.0->nnabla-ext-cuda110) (1.26.4)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ubuntu/anaconda3/lib/python3.8/site-packages (from botocore<1.26.0,>=1.25.9->boto3->nnabla==1.28.0->nnabla-ext-cuda110) (2.8.1)
Installing collected packages: nnabla
Successfully installed nnabla-1.28.0

Error from nnabla on the Gforce 3090:

(base) ubuntu@gc-infallible-goldwasser:/mnt/vol_b/anaconda_installs$ python -c "import nnabla_ext.cuda, nnabla_ext.cudnn"
2022-05-09 15:47:49,600 [nnabla][INFO]: Initializing CPU extension...
Traceback (most recent call last):
File "", line 1, in
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/nnabla_ext/cuda/init.py", line 131, in
load_shared_from_error(err)
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/nnabla_ext/cuda/init.py", line 67, in load_shared_from_error
raise err
File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/nnabla_ext/cuda/init.py", line 122, in
from .init import (
ImportError: libcusolver.so.10: cannot open shared object file: No such file or directory

Nvidia drivers:

base) ubuntu@gc-infallible-goldwasser:/mnt/vol_b/anaconda_installs$ nvidia-smi
Mon May 9 15:52:08 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:00:05.0 Off | N/A |
| 0% 29C P8 10W / 350W | 5MiB / 24268MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 832 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+

Tomonobu-san,

I was using a nvcnet docker setup but this is no longer working. There appears to be a key issue with Nvidia.

So, Why does nnabla try pulling .so.10 files when everything is cuda 11?

Thanks so much for your input,
philip

I was able to get the version 1.28 to load in the nnabla nvcnet docker script changing to the 1.28 version. But when I run get no GPU and error:

Docker create file:
FROM nnabla/nnabla-ext-cuda:py38-cuda110-v1.28.0
#FROM nnabla/nnabla-ext-cuda-multi-gpu:py38-cuda110-mpi3.1.6-v1.19.0
USER root

ENV HTTP_PROXY ${http_proxy}
ENV HTTPS_PROXY ${https_proxy}

RUN apt-get update
RUN apt-get install -y libsndfile1 git sox
RUN pip install --upgrade pip
RUN pip install tqdm seaborn sklearn librosa numba==0.48.0 matplotlib sox pyloudnorm

Error:
(base) ubuntu@gc-infallible-goldwasser:/mnt/vol_b/TTSplus/Voice/DWavNST$ sudo docker run --gpus all -it --shm-size 1G --rm -v /mnt/vol_b/TTSplus/:/data nvcnet:latest
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

Please let me talk about docker issue.

This seems to indicate that the required components are not installed, and there is no nvidia-docker2 from provided dpkg list.
I think you already installed docker, referenced from: https://docs.docker.com/engine/install/ubuntu/
So, please install nvidia-docker2 referenced from: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
After that, you need to restart docker service.

Thank you Tomonobu.

I have the nvidia nnabla docker image working, but since I have to install nnabla through a docker it is difficult to add other resources.

For example,
I create a nnabla docker instance. I then install conda in that environment thinking that I can commit the docker container and have both nnabla and the anaconda in a single docker image/container. BUT, the anaconda install is breaking the nnabla install. I think normally it would be better to install anaconda then nnabla, but docker doesn't allow this.

This is why I was hoping to get nnabla working via pip.

Do you know how to add anaconda to the nnabla docker so they both work?

Bests and thanks so much,
philip

Here is an example inside docker container.

nnabla working:
root@6f2b75f43991:/data/TTSplus/nvcnet_test# which nnabla
root@6f2b75f43991:/data/TTSplus/nvcnet_test# python -c "import nnabla

"
2022-05-10 17:13:50,831 [nnabla][INFO]: Initializing CPU extension...

source .bashrc for conda:
root@6f2b75f43991:/data/TTSplus/nvcnet_test# source ~/.bashrc
(base) root@6f2b75f43991:/data/TTSplus/nvcnet_test# conda --version
conda 4.10.1

Check nnabla:
(base) root@6f2b75f43991:/data/TTSplus/nvcnet_test# python -c "import nnabla"
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'nnabla'
(base) root@6f2b75f43991:/data/TTSplus/nvcnet_test#

Seems like the conda over writes the module nnabla.

Thanks,
philip

At first, please let me update for libcusolver.so.10.
This library is linked since nnabla v1.27.0, details are currently being confirmed.
But, there is an article that links to libcusolver.so.11 to avoid this error.

For docker image, we used conda (miniconda) before, and I checked old docker files.
https://github.com/sony/nnabla-ext-cuda/blob/a0f97ec82f9aebae9dfe331026c7cc806740b439/docker/py37/cuda110/Dockerfile
This simply installs nnabla with pip, but you need to use pip included in conda, not in /usr/bin/.

conda seems to be provided docker images: https://hub.docker.com/u/continuumio
Can you install nnabla with pip using these images?
If you cannot install nnabla using these images, could you please provide error messages?

Tomonobu,

I tried using both the continum.io and your docker shown above but still get errors.

See output below:

Thanks,
-philip

running the continuum.io miniconda then pip install nnabla-ext-cuda110:

ubuntu@gc-infallible-pike:/mnt/vol_b/TTSplus$ sudo docker run -it --shm-size 1G --rm -v /mnt/vol_b:/mnt/vol_b continuumio/miniconda3:latest
(base) root@06c2f2829fa3:/# cd /mnt/vol_b
(base) root@06c2f2829fa3:/mnt/vol_b# ls
TTSplus anaconda3 anaconda_installs lost+found xfer
(base) root@06c2f2829fa3:/mnt/vol_b# pip install nnabla-ext-cuda110
Collecting nnabla-ext-cuda110
Downloading nnabla_ext_cuda110-1.28.0-cp39-cp39-manylinux_2_17_x86_64.whl (60.8 MB)
|████████████████████████████████| 60.8 MB 46.1 MB/s
Requirement already satisfied: setuptools in /opt/conda/lib/python3.9/site-packages (from nnabla-ext-cuda110) (58.0.4)
Collecting nnabla==1.28.0
Downloading nnabla-1.28.0-cp39-cp39-manylinux_2_17_x86_64.whl (18.6 MB)
|████████████████████████████████| 18.6 MB 52.2 MB/s
Collecting protobuf>=3.6
Downloading protobuf-3.20.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)
|████████████████████████████████| 1.0 MB 43.1 MB/s
Collecting numpy
Downloading numpy-1.22.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB)
|████████████████████████████████| 16.8 MB 44.7 MB/s
Collecting h5py
Downloading h5py-3.6.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (4.5 MB)
|████████████████████████████████| 4.5 MB 27.1 MB/s
Collecting scipy
Downloading scipy-1.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (42.1 MB)
|████████████████████████████████| 42.1 MB 223 kB/s
Requirement already satisfied: tqdm in /opt/conda/lib/python3.9/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (4.62.3)
Collecting contextlib2
Downloading contextlib2-21.6.0-py2.py3-none-any.whl (13 kB)
Collecting pyyaml
Downloading PyYAML-6.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (661 kB)
|████████████████████████████████| 661 kB 46.3 MB/s
Collecting imageio
Downloading imageio-2.19.1-py3-none-any.whl (3.4 MB)
|████████████████████████████████| 3.4 MB 47.3 MB/s
Collecting pillow
Downloading Pillow-9.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB)
|████████████████████████████████| 4.3 MB 42.3 MB/s
Collecting boto3
Downloading boto3-1.22.12-py3-none-any.whl (132 kB)
|████████████████████████████████| 132 kB 51.2 MB/s
Collecting Cython
Downloading Cython-0.29.28-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
|████████████████████████████████| 1.9 MB 38.7 MB/s
Requirement already satisfied: six in /opt/conda/lib/python3.9/site-packages (from nnabla==1.28.0->nnabla-ext-cuda110) (1.16.0)
Collecting configparser
Downloading configparser-5.2.0-py3-none-any.whl (19 kB)
Collecting botocore<1.26.0,>=1.25.12
Downloading botocore-1.25.12-py3-none-any.whl (8.7 MB)
|████████████████████████████████| 8.7 MB 46.0 MB/s
Collecting jmespath<2.0.0,>=0.7.1
Downloading jmespath-1.0.0-py3-none-any.whl (23 kB)
Collecting s3transfer<0.6.0,>=0.5.0
Downloading s3transfer-0.5.2-py3-none-any.whl (79 kB)
|████████████████████████████████| 79 kB 5.6 MB/s
Collecting python-dateutil<3.0.0,>=2.1
Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
|████████████████████████████████| 247 kB 44.6 MB/s
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /opt/conda/lib/python3.9/site-packages (from botocore<1.26.0,>=1.25.12->boto3->nnabla==1.28.0->nnabla-ext-cuda110) (1.26.7)
Installing collected packages: python-dateutil, jmespath, botocore, s3transfer, pillow, numpy, scipy, pyyaml, protobuf, imageio, h5py, Cython, contextlib2, configparser, boto3, nnabla, nnabla-ext-cuda110
Successfully installed Cython-0.29.28 boto3-1.22.12 botocore-1.25.12 configparser-5.2.0 contextlib2-21.6.0 h5py-3.6.0 imageio-2.19.1 jmespath-1.0.0 nnabla-1.28.0 nnabla-ext-cuda110-1.28.0 numpy-1.22.3 pillow-9.1.0 protobuf-3.20.1 python-dateutil-2.8.2 pyyaml-6.0 s3transfer-0.5.2 scipy-1.8.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
(base) root@06c2f2829fa3:/mnt/vol_b# python -c "import nnabla_ext.cuda, nnabla_ext.cudnn"
2022-05-12 18:18:49,639 [nnabla][INFO]: Initializing CPU extension...
GPU compatibility could not be verified due to a problem getting the GPU list.
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.9/site-packages/nnabla_ext/cuda/init.py", line 131, in
load_shared_from_error(err)
File "/opt/conda/lib/python3.9/site-packages/nnabla_ext/cuda/init.py", line 67, in load_shared_from_error
raise err
File "/opt/conda/lib/python3.9/site-packages/nnabla_ext/cuda/init.py", line 122, in
from .init import (
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
(base) root@06c2f2829fa3:/mnt/vol_b#

run script docker and the docker file you suggested:

more ./scripts/docker_build.sh
cd docker
docker build . --rm --no-cache=true -t miniconda4.10.1:latest
ubuntu@gc-infallible-pike:/mnt/vol_b/TTSplus/docker$ more docker/Dockerfile

Copyright (c) 2020 Sony Corporation. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

FROM nvidia/cuda:11.0-cudnn8-runtime-ubuntu18.04

RUN apt-get update
&& apt-get install -y --no-install-recommends
bzip2
ca-certificates
curl
&& rm -rf /var/lib/apt/lists/*

RUN umask 0
&& mkdir -p /tmp/deps
&& cd /tmp/deps
&& curl -L https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -o Miniconda3-latest-Linux-x86_64.sh
&& bash Miniconda3-latest-Linux-x86_64.sh -b -p /opt/miniconda3
&& rm -rf Miniconda3-latest-Linux-x86_64.sh
&& PATH=/opt/miniconda3/bin:$PATH
&& conda install python=3.7
&& conda install pip wheel
&& conda install opencv || true
&& pip install --extra-index-url https://developer.download.nvidia.com/compute/redist nvidia-dali-cuda110
|| echo "Skip DALI installation (CUDA=11.0)"
&& conda clean -y --all
&& cd /
&& rm -rf /tmp/*

ENV PATH /opt/miniconda3/bin:$PATH

ARG NNABLA_VER
RUN pip install nnabla-ext-cuda110==${NNABLA_VER} nnabla_converter==${NNABLA_VER}

sudo ./scripts/docker_build.sh
Sending build context to Docker daemon 3.584kB
Step 1/6 : FROM nvidia/cuda:11.0-cudnn8-runtime-ubuntu18.04
---> 848be2582b0a
Step 2/6 : RUN apt-get update && apt-get install -y --no-install-recommends bzip2 ca-certificates curl && rm -rf /var/lib/apt/lists/*
---> Running in ab3a70afebc5
Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease [1581 B]
Get:3 http://archive.ubuntu.com/ubuntu bionic InRelease [242 kB]
Err:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease
The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
Ign:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 InRelease
Get:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release [564 B]
Get:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Release.gpg [833 B]
Get:7 http://security.ubuntu.com/ubuntu bionic-security/universe amd64 Packages [1503 kB]
Get:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64 Packages [73.8 kB]
Get:9 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2761 kB]
Get:10 http://security.ubuntu.com/ubuntu bionic-security/restricted amd64 Packages [932 kB]
Get:11 http://security.ubuntu.com/ubuntu bionic-security/multiverse amd64 Packages [21.1 kB]
Get:12 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Get:13 http://archive.ubuntu.com/ubuntu bionic-backports InRelease [74.6 kB]
Get:14 http://archive.ubuntu.com/ubuntu bionic/universe amd64 Packages [11.3 MB]
Get:15 http://archive.ubuntu.com/ubuntu bionic/multiverse amd64 Packages [186 kB]
Get:16 http://archive.ubuntu.com/ubuntu bionic/restricted amd64 Packages [13.5 kB]
Get:17 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages [1344 kB]
Get:18 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages [3195 kB]
Get:19 http://archive.ubuntu.com/ubuntu bionic-updates/multiverse amd64 Packages [29.8 kB]
Get:20 http://archive.ubuntu.com/ubuntu bionic-updates/restricted amd64 Packages [966 kB]
Get:21 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 Packages [2277 kB]
Get:22 http://archive.ubuntu.com/ubuntu bionic-backports/universe amd64 Packages [12.9 kB]
Get:23 http://archive.ubuntu.com/ubuntu bionic-backports/main amd64 Packages [12.2 kB]
Reading package lists...
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease' is not signed.
The command '/bin/sh -c apt-get update && apt-get install -y --no-install-recommends bzip2 ca-certificates curl && rm -rf /var/lib/apt/lists/*' returned a non-zero code: 100

I am unable to get the Nvidia (new) key distribution for public keys working on this docker file.

I have tried the following as suggested in the nvidia help blogs:

RUN rm /etc/apt/sources.list.d/cuda.list
RUN apt-key del 7fa2af80
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80

and many other variations.

I was able to get the Nvidia nvcr.io/partners/nnabla docjkers loaded. But these fail to no device driver. I'm assuming its the missing .so.10 file and thus exits.

See below:

ubuntu@gc-infallible-pike:/mnt/vol_b/TTSplus/docker$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
continuumio/miniconda3 latest 8ad7d89d2773 2 weeks ago 400MB
nvcr.io/partners/nnabla v1.24.0-mpi3.1.6 e2ac2a001dac 4 months ago 8.93GB
nvcr.io/partners/nnabla v1.21.0-mpi3.1.6 1c1c66cc0470 8 months ago 8.22GB
nvcr.io/partners/nnabla v1.19.0-mpi3.1.6 1a79191c879b 12 months ago 7.73GB
nvidia/cuda 11.0-cudnn8-runtime-ubuntu18.04 848be2582b0a 19 months ago 3.6GB
ubuntu@gc-infallible-pike:/mnt/vol_b/TTSplus/docker$ sudo docker run -it --shm-size 1G --rm -v /mnt/vol_b:/mnt/vol_b nvcr.io/partners/nnabla:v1.24.0-mpi3.1.6
Failed to detect NVIDIA environment.

== Neural Network Libraries ==

nnabla@cecd69f9b002:~$ exit
exit
ubuntu@gc-infallible-pike:/mnt/vol_b/TTSplus/docker$ sudo docker run --gpus all -it --shm-size 1G --rm -v /mnt/vol_b:/mnt/vol_b nvcr.io/partners/nnabla:v1.24.0-mpi3.1.6
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ubuntu@gc-infallible-pike:/mnt/vol_b/TTSplus/docker$ sudo docker run --gpus all -it --shm-size 1G --rm -v /mnt/vol_b:/mnt/vol_b nvcr.io/partners/nnabla:v1.21.0-mpi3.1.6
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ubuntu@gc-infallible-pike:/mnt/vol_b/TTSplus/docker$ sudo docker run --gpus all -it --shm-size 1G --rm -v /mnt/vol_b:/mnt/vol_b nvcr.io/partners/nnabla:v1.19.0-mpi3.1.6
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ubuntu@gc-infallible-pike:/mnt/vol_b/TTSplus/docker$

Thanks again for your support!
philip

Tomonobu,

You also mentioned an article on getting around the .so version error. I am unable to find the article.

Do you have the location and I will try that as a fix?

Thanks again,
philip

Hi Philip-san,

W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC

Yes, this error is caused by nvidia's GPG key was updated in April.
So, you need to add apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub before apt-get update.

Regarding pip install on miniconda:
I think nnabla and nnabla-ext-cuda were installed successfully.
miniconda image doesn't have cuda, so you got this error: ImportError: libcuda.so.1: cannot open shared object file: No such file or directory.
Go back to first topic, If you want to use both conda and pip, you need to make sure that pip is conda-capable.
You can check it by pip --version or which pip.

Regarding to NGC container (nvcr.io/partners/nnabla), your error message is normal.
you need --gpus option in docker argument to use GPU, otherwise you cannot use GPUs in docker container.
Here is example:
$ sudo docker run -it --shm-size 1G --rm --gpus all -v /mnt/vol_b:/mnt/vol_b nvcr.io/partners/nnabla:v1.24.0-mpi3.1.6

Also, I don't know if this miniconda updates the components, but we need numpy>=1.20.0, so if miniconda has old numpy, please uninstall it at first.
numpy will be installed when pip installs nnabla.

If you don't need conda, you can use these our official docker images (you can select python version from 3.7, 3.8 and 3.9 as of now):

  • For single gpu: nnabla/nnabla-ext-cuda:py38-cuda110-v1.28.0
  • For multi gpus: nnabla/nnabla-ext-cuda-multi-gpu:py38-cuda110-mpi3.1.6-v1.28.0

Sorry for writing many topics, please let me know if you still face error.

Tomonobu-san,

We will need conda.

I have installed miniconda-cuda and this works with the GPU. FYI: I managed to break the nvidia-docker and that was causing the issue with no gpus (from previous issue).

I am still getting the so error after pip installing the nnabla-ext-cuda110.

(base) root@8df332a5c4db:/mnt/vol_b# which pip
/opt/conda/bin/pip

Error - libsolver.so.10:
(base) root@8df332a5c4db:/mnt/vol_b# python -c "import nnabla_ext.cuda, nnabla_ext.cudnn"
2022-05-16 15:38:51,447 [nnabla][INFO]: Initializing CPU extension...
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.8/site-packages/nnabla_ext/cuda/init.py", line 131, in
load_shared_from_error(err)
File "/opt/conda/lib/python3.8/site-packages/nnabla_ext/cuda/init.py", line 67, in load_shared_from_error
raise err
File "/opt/conda/lib/python3.8/site-packages/nnabla_ext/cuda/init.py", line 122, in
from .init import (
ImportError: libcusolver.so.10: cannot open shared object file: No such file or directory
(base) root@8df332a5c4db:/mnt/vol_b#

ls of the dir:
(base) root@8df332a5c4db:/mnt/vol_b# ls /opt/conda/lib/python3.8/site-packages/nnabla_ext/cuda/
doc/ incompatible_gpu_list.py init.py nvtx.cpython-38-x86_64-linux-gnu.so utils/
experimental/ init.cpython-38-x86_64-linux-gnu.so libnnabla_cuda-110_8.so pycache/ _version.py

Is there any news on the issue with libcudasolver.so.10 versus so.11?

Where is the article located that can get around this problem?

Thanks again,
philip

Here is the install of the .so files. I see a cuda, cud11, and a cud11-4. All have the libcusolver.11 not the libcusolver.10.

I assume the libcusolver.so.11 is for the cuda 11 version while the libcusolver.10 is for cuda 10? Why is nnabla looking for the version 10 .so and not what Nvidia installs libcusolverso.11?

(base) root@8df332a5c4db:/mnt/vol_b# ls /usr/local/cuda
bin compat compute-sanitizer cuda-11.4 doc extras gds include lib64 nvml nvvm share src targets
(base) root@8df332a5c4db:/mnt/vol_b# ls /usr/local/cuda/lib64
libaccinj64.so libcufft.so.10 libcuinj64.so libcusolver.so.11.2.0.120 libnppicc.so.11 libnppim.so.11.4.0.110 libnpps_static.a libnvrtc.so.11.4.152
libaccinj64.so.11.4 libcufft.so.10.5.2.100 libcuinj64.so.11.4 libcusolver_static.a libnppicc.so.11.4.0.110 libnppim_static.a libnvblas.so libnvToolsExt.so
libaccinj64.so.11.4.120 libcufft_static.a libcuinj64.so.11.4.120 libcusparse.so libnppicc_static.a libnppist.so libnvblas.so.11 libnvToolsExt.so.1
libcublasLt.so libcufft_static_nocallback.a libculibos.a libcusparse.so.11 libnppidei.so libnppist.so.11 libnvblas.so.11.5.4.8 libnvToolsExt.so.1.0.0
libcublasLt.so.11 libcufftw.so libcupti.so libcusparse.so.11.6.0.120 libnppidei.so.11 libnppist.so.11.4.0.110 libnvjpeg.so libOpenCL.so
libcublasLt.so.11.5.4.8 libcufftw.so.10 libcupti.so.11.4 libcusparse_static.a libnppidei.so.11.4.0.110 libnppist_static.a libnvjpeg.so.11 libOpenCL.so.1
libcublasLt_static.a libcufftw.so.10.5.2.100 libcupti.so.2021.2.2 liblapack_static.a libnppidei_static.a libnppisu.so libnvjpeg.so.11.5.2.120 libOpenCL.so.1.0
libcublas.so libcufftw_static.a libcupti_static.a libmetis_static.a libnppif.so libnppisu.so.11 libnvjpeg_static.a libOpenCL.so.1.0.0
libcublas.so.11 libcufile_rdma.so libcurand.so libnppc.so libnppif.so.11 libnppisu.so.11.4.0.110 libnvperf_host.so libpcsamplingutil.so
libcublas.so.11.5.4.8 libcufile_rdma.so.1 libcurand.so.10 libnppc.so.11 libnppif.so.11.4.0.110 libnppisu_static.a libnvperf_host_static.a stubs
libcublas_static.a libcufile_rdma.so.1.0.2 libcurand.so.10.2.5.120 libnppc.so.11.4.0.110 libnppif_static.a libnppitc.so libnvperf_target.so
libcudadevrt.a libcufile_rdma_static.a libcurand_static.a libnppc_static.a libnppig.so libnppitc.so.11 libnvptxcompiler_static.a
libcudart.so libcufile.so libcusolverMg.so libnppial.so libnppig.so.11 libnppitc.so.11.4.0.110 libnvrtc-builtins.so
libcudart.so.11.0 libcufile.so.0 libcusolverMg.so.11 libnppial.so.11 libnppig.so.11.4.0.110 libnppitc_static.a libnvrtc-builtins.so.11.4
libcudart.so.11.4.148 libcufile.so.1.0.2 libcusolverMg.so.11.2.0.120 libnppial.so.11.4.0.110 libnppig_static.a libnpps.so libnvrtc-builtins.so.11.4.152
libcudart_static.a libcufile_static.a libcusolver.so libnppial_static.a libnppim.so libnpps.so.11 libnvrtc.so
libcufft.so libcufilt.a libcusolver.so.11 libnppicc.so libnppim.so.11 libnpps.so.11.4.0.110 libnvrtc.so.11.2
(base) root@8df332a5c4db:/mnt/vol_b# ls /usr/local/cuda
bin compat compute-sanitizer cuda-11.4 doc extras gds include lib64 nvml nvvm share src targets
(base) root@8df332a5c4db:/mnt/vol_b# ls /usr/local/cuda-11.4/lib64
libaccinj64.so libcufft.so.10 libcuinj64.so libcusolver.so.11.2.0.120 libnppicc.so.11 libnppim.so.11.4.0.110 libnpps_static.a libnvrtc.so.11.4.152
libaccinj64.so.11.4 libcufft.so.10.5.2.100 libcuinj64.so.11.4 libcusolver_static.a libnppicc.so.11.4.0.110 libnppim_static.a libnvblas.so libnvToolsExt.so
libaccinj64.so.11.4.120 libcufft_static.a libcuinj64.so.11.4.120 libcusparse.so libnppicc_static.a libnppist.so libnvblas.so.11 libnvToolsExt.so.1
libcublasLt.so libcufft_static_nocallback.a libculibos.a libcusparse.so.11 libnppidei.so libnppist.so.11 libnvblas.so.11.5.4.8 libnvToolsExt.so.1.0.0
libcublasLt.so.11 libcufftw.so libcupti.so libcusparse.so.11.6.0.120 libnppidei.so.11 libnppist.so.11.4.0.110 libnvjpeg.so libOpenCL.so
libcublasLt.so.11.5.4.8 libcufftw.so.10 libcupti.so.11.4 libcusparse_static.a libnppidei.so.11.4.0.110 libnppist_static.a libnvjpeg.so.11 libOpenCL.so.1
libcublasLt_static.a libcufftw.so.10.5.2.100 libcupti.so.2021.2.2 liblapack_static.a libnppidei_static.a libnppisu.so libnvjpeg.so.11.5.2.120 libOpenCL.so.1.0
libcublas.so libcufftw_static.a libcupti_static.a libmetis_static.a libnppif.so libnppisu.so.11 libnvjpeg_static.a libOpenCL.so.1.0.0
libcublas.so.11 libcufile_rdma.so libcurand.so libnppc.so libnppif.so.11 libnppisu.so.11.4.0.110 libnvperf_host.so libpcsamplingutil.so
libcublas.so.11.5.4.8 libcufile_rdma.so.1 libcurand.so.10 libnppc.so.11 libnppif.so.11.4.0.110 libnppisu_static.a libnvperf_host_static.a stubs
libcublas_static.a libcufile_rdma.so.1.0.2 libcurand.so.10.2.5.120 libnppc.so.11.4.0.110 libnppif_static.a libnppitc.so libnvperf_target.so
libcudadevrt.a libcufile_rdma_static.a libcurand_static.a libnppc_static.a libnppig.so libnppitc.so.11 libnvptxcompiler_static.a
libcudart.so libcufile.so libcusolverMg.so libnppial.so libnppig.so.11 libnppitc.so.11.4.0.110 libnvrtc-builtins.so
libcudart.so.11.0 libcufile.so.0 libcusolverMg.so.11 libnppial.so.11 libnppig.so.11.4.0.110 libnppitc_static.a libnvrtc-builtins.so.11.4
libcudart.so.11.4.148 libcufile.so.1.0.2 libcusolverMg.so.11.2.0.120 libnppial.so.11.4.0.110 libnppig_static.a libnpps.so libnvrtc-builtins.so.11.4.152
libcudart_static.a libcufile_static.a libcusolver.so libnppial_static.a libnppim.so libnpps.so.11 libnvrtc.so
libcufft.so libcufilt.a libcusolver.so.11 libnppicc.so libnppim.so.11 libnpps.so.11.4.0.110 libnvrtc.so.11.2
(base) root@8df332a5c4db:/mnt/vol_b# ls /usr/local/cuda-11
bin compat compute-sanitizer cuda-11.4 doc extras gds include lib64 nvml nvvm share src targets
(base) root@8df332a5c4db:/mnt/vol_b# ls /usr/local/cuda-11/lib64
libaccinj64.so libcufft.so.10 libcuinj64.so libcusolver.so.11.2.0.120 libnppicc.so.11 libnppim.so.11.4.0.110 libnpps_static.a libnvrtc.so.11.4.152
libaccinj64.so.11.4 libcufft.so.10.5.2.100 libcuinj64.so.11.4 libcusolver_static.a libnppicc.so.11.4.0.110 libnppim_static.a libnvblas.so libnvToolsExt.so
libaccinj64.so.11.4.120 libcufft_static.a libcuinj64.so.11.4.120 libcusparse.so libnppicc_static.a libnppist.so libnvblas.so.11 libnvToolsExt.so.1
libcublasLt.so libcufft_static_nocallback.a libculibos.a libcusparse.so.11 libnppidei.so libnppist.so.11 libnvblas.so.11.5.4.8 libnvToolsExt.so.1.0.0
libcublasLt.so.11 libcufftw.so libcupti.so libcusparse.so.11.6.0.120 libnppidei.so.11 libnppist.so.11.4.0.110 libnvjpeg.so libOpenCL.so
libcublasLt.so.11.5.4.8 libcufftw.so.10 libcupti.so.11.4 libcusparse_static.a libnppidei.so.11.4.0.110 libnppist_static.a libnvjpeg.so.11 libOpenCL.so.1
libcublasLt_static.a libcufftw.so.10.5.2.100 libcupti.so.2021.2.2 liblapack_static.a libnppidei_static.a libnppisu.so libnvjpeg.so.11.5.2.120 libOpenCL.so.1.0
libcublas.so libcufftw_static.a libcupti_static.a libmetis_static.a libnppif.so libnppisu.so.11 libnvjpeg_static.a libOpenCL.so.1.0.0
libcublas.so.11 libcufile_rdma.so libcurand.so libnppc.so libnppif.so.11 libnppisu.so.11.4.0.110 libnvperf_host.so libpcsamplingutil.so
libcublas.so.11.5.4.8 libcufile_rdma.so.1 libcurand.so.10 libnppc.so.11 libnppif.so.11.4.0.110 libnppisu_static.a libnvperf_host_static.a stubs
libcublas_static.a libcufile_rdma.so.1.0.2 libcurand.so.10.2.5.120 libnppc.so.11.4.0.110 libnppif_static.a libnppitc.so libnvperf_target.so
libcudadevrt.a libcufile_rdma_static.a libcurand_static.a libnppc_static.a libnppig.so libnppitc.so.11 libnvptxcompiler_static.a
libcudart.so libcufile.so libcusolverMg.so libnppial.so libnppig.so.11 libnppitc.so.11.4.0.110 libnvrtc-builtins.so
libcudart.so.11.0 libcufile.so.0 libcusolverMg.so.11 libnppial.so.11 libnppig.so.11.4.0.110 libnppitc_static.a libnvrtc-builtins.so.11.4
libcudart.so.11.4.148 libcufile.so.1.0.2 libcusolverMg.so.11.2.0.120 libnppial.so.11.4.0.110 libnppig_static.a libnpps.so libnvrtc-builtins.so.11.4.152
libcudart_static.a libcufile_static.a libcusolver.so libnppial_static.a libnppim.so libnpps.so.11 libnvrtc.so
libcufft.so libcufilt.a libcusolver.so.11 libnppicc.so libnppim.so.11 libnpps.so.11.4.0.110 libnvrtc.so.11.2

Tomonobu-san,

I decided to mv the libcusolver from so.11 to so.10 to see what happens. Now nnabla is looking for a so.8??? Please see below and thanks greatly:

(base) root@8df332a5c4db:/mnt/vol_b# mv /usr/local/cuda-11.4/lib64/libcusolver.so.11 /usr/local/cuda-11.4/lib64/libcusolver.so.10
(base) root@8df332a5c4db:/mnt/vol_b# python -c "import nnabla_ext.cuda, nnabla_ext.cudnn"
2022-05-16 17:13:42,156 [nnabla][INFO]: Initializing CPU extension...
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.8/site-packages/nnabla_ext/cuda/init.py", line 131, in
load_shared_from_error(err)
File "/opt/conda/lib/python3.8/site-packages/nnabla_ext/cuda/init.py", line 67, in load_shared_from_error
raise err
File "/opt/conda/lib/python3.8/site-packages/nnabla_ext/cuda/init.py", line 122, in
from .init import (
ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory
(base) root@8df332a5c4db:/mnt/vol_b#

Tomonobu-san,

Sorry to keep adding on but I noticed this in the /usr/local directory and don't know if this is correct:

Note: Multiple cuda
(base) root@8df332a5c4db:/mnt/vol_b# ls -al /usr/local
total 56
drwxr-xr-x 1 root root 4096 Jan 20 23:11 .
drwxr-xr-x 1 root root 4096 Jan 5 16:47 ..
drwxr-xr-x 2 root root 4096 Jan 5 16:47 bin
lrwxrwxrwx 1 root root 22 Jan 20 23:11 cuda -> /etc/alternatives/cuda
lrwxrwxrwx 1 root root 25 Jan 20 23:11 cuda-11 -> /etc/alternatives/cuda-11
drwxr-xr-x 1 root root 4096 Jan 20 23:26 cuda-11.4
drwxr-xr-x 2 root root 4096 Jan 5 16:47 etc
drwxr-xr-x 2 root root 4096 Jan 5 16:47 games
drwxr-xr-x 2 root root 4096 Jan 5 16:47 include
drwxr-xr-x 2 root root 4096 Jan 5 16:47 lib
lrwxrwxrwx 1 root root 9 Jan 5 16:47 man -> share/man
drwxr-xr-x 2 root root 4096 Jan 5 16:50 sbin
drwxr-xr-x 1 root root 4096 Jan 20 23:11 share
drwxr-xr-x 2 root root 4096 Jan 5 16:47 src

Inside the cuda-11.4 dir (Note the cuda-11.4 pointing to 11.4:
(base) root@8df332a5c4db:/mnt/vol_b# ls -al /usr/local/cuda-11.4/
total 64
drwxr-xr-x 1 root root 4096 Jan 20 23:26 .
drwxr-xr-x 1 root root 4096 Jan 20 23:11 ..
drwxr-xr-x 3 root root 4096 Jan 20 23:24 bin
drwxr-xr-x 1 root root 4096 Mar 15 09:06 compat
drwxr-xr-x 4 root root 4096 Jan 20 23:24 compute-sanitizer
lrwxrwxrwx 1 root root 9 Jan 20 23:11 cuda-11.4 -> cuda-11.4
drwxr-xr-x 3 root root 4096 Jan 20 23:25 doc
drwxr-xr-x 4 root root 4096 Jan 20 23:24 extras
drwxr-xr-x 2 root root 4096 Jan 20 23:15 gds
lrwxrwxrwx 1 root root 28 Jul 15 2021 include -> targets/x86_64-linux/include
lrwxrwxrwx 1 root root 24 Jul 27 2021 lib64 -> targets/x86_64-linux/lib
drwxr-xr-x 1 root root 4096 Jan 20 23:26 nvml
drwxr-xr-x 7 root root 4096 Jan 20 23:24 nvvm
drwxr-xr-x 3 root root 4096 Jan 20 23:24 share
drwxr-xr-x 1 root root 4096 Mar 15 09:06 src
drwxr-xr-x 1 root root 4096 Jan 20 23:11 targets

Thanks,
philip

One final note:

The original nvidia 11.0 has the so.10, but these disappear in later versions like 11.4 and becomes so.11. The gpu needs the newer 11.4 version to run. Is this cuda version 11.4+ not supported?

Sorry again for all the input but our production development is stopped due to nnabla not working...

thanks,
philip

Maybe simple solution is:

  • install libcudnn8 by apt, and
  • install nnabla_ext_cuda110==1.26.0 by pip

Because nnabla links to libcusolver.so for cholesky supports which was introduced in nnabla v1.27.0:
https://blog.nnabla.org/release/v1-27-0/

Please also refer to the followings.

I assume the libcusolver.so.11 is for the cuda 11 version while the libcusolver.10 is for cuda 10? Why is nnabla looking for the version 10 .so and not what Nvidia installs libcusolverso.11?

It's nvidia spec.
CUDA11.0 has libcusolver.so.10 not libcusolver.so.11. And CUDA11.1 and the later has libcusolver.so.11.
nnabla supports CUDA11.0 as of now, so nnabla links libcusolver.so.10.
It's really confusing, so many developers were in trouble.

Here is a thread in stackoverflow:
https://stackoverflow.com/questions/63199164/how-to-install-libcusolver-so-11
This is not official information from nvidia, but I think it's worth a try if you use latest nnabla

ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory

Your environment has cuda, but seems not to have cudnn.
I think you need to install libcudnn8, please check our site ( https://nnabla.org/install/ ) how to setup environment.

  • Extension: select CUDA 11.0
  • Click click here to display how to install CUDA and cuDNN
  • Select appropriate tab, maybe Ubuntu18.04.

The gpu needs the newer 11.4 version to run. Is this cuda version 11.4+ not supported?

I posted before, we supports cuda10.2 and cuda11.0 as of now, but we now investigate new cuda version support.

But in my understanding, even if GPU supports latest cuda, it can work CUDA11.0 properly with CUDA backward compatibility.
nvidia-driver should always be latest version, but it can work with cuda10 runtime or cuda11 runtime, basically.

Can I close this issue? If you still face error, please let me know.
I think this thread is valuable, so I will summarize before closing.

Tomonobu-san,

Yes, Please close. I have nnabla running now. It would be nice for some newer nnabla-cuda versions to be released and tested.

Thank you very much for your input and patience.

Bests,
philip

Thanks philip-san, so please let me summarize this thread.

Error: ImportError: libcusolver.so.10: cannot open shared object file: No such file or directory:

  • This is due to cuda11.0 has libcusolver.so.10, but cuda11.1 and later have libcusolver.so.11.
  • This error may occur from nnabla v1.27.0.
  • Solution1: create symbolic link to libcusolver.so.11 if system has.
  • Solution2: use nnabla v1.26.0
  • Solution3: install cuda11.0 on your environment

Error: ImportError: libcudnn.so.8: cannot open shared object file: No such file or directory:

  • This error says that cudnn is not installed on your system
  • Please install cudnn8. You can refer https://nnabla.org/install to setup environment.

Error: W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC:

  • This error occurs because nvidia gpg key was changed in Apr 2022
  • To install new gpg key, please check this PR: sony/nnabla-ext-cuda#401

Using conda environment:

  • You will be able to use nnabla on conda environment.
  • When using coda, please use pip on conda. You can check it by pip --version.
  • Also, please make sure that numpy version is 1.20.0 or newer.