GPU not recognized in docker image created using Dockerfile.intel
mastrogiovanni opened this issue · 1 comments
mastrogiovanni commented
We have a TITAN RTX GPU correctly configured on a Ubuntu 18.04 machine:
$ nvidia-smi
Sat Mar 7 16:41:19 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN RTX Off | 00000000:3B:00.0 Off | N/A |
| 41% 34C P8 13W / 280W | 112MiB / 24212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2004 G /usr/lib/xorg/Xorg 39MiB |
| 0 2120 G /usr/bin/gnome-shell 70MiB |
+-----------------------------------------------------------------------------+
We use Docker 19.03.7
$ docker version
Client: Docker Engine - Community
Version: 19.03.7
API version: 1.40
Go version: go1.12.17
Git commit: 7141c199a2
Built: Wed Mar 4 01:22:36 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.7
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: 7141c199a2
Built: Wed Mar 4 01:21:08 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
Docker is correctly configured to support GPU:
$ docker run --gpus all nvidia/cuda nvidia-smi
Sat Mar 7 15:46:14 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN RTX Off | 00000000:3B:00.0 Off | N/A |
| 41% 34C P8 13W / 280W | 112MiB / 24212MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
We built the image with the command:
docker build -t mdt -f containers/Dockerfile.intel .
but when we run:
$ docker run --gpus all mdtold mdt-list-devices
we obtain
Traceback (most recent call last):
File "/usr/bin/mdt-list-devices", line 9, in <module>
load_entry_point('mdt==1.2.2', 'console_scripts', 'mdt-list-devices')()
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 542, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2569, in load_entry_point
return ep.load()
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2229, in load
return self.resolve()
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2235, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "/usr/lib/python3/dist-packages/mdt/__init__.py", line 8, in <module>
import mot
File "/usr/lib/python3/dist-packages/mot/__init__.py", line 3, in <module>
from .optimize import minimize, get_minimizer_options
File "/usr/lib/python3/dist-packages/mot/optimize/__init__.py", line 1, in <module>
from mot.lib.cl_function import SimpleCLFunction
File "/usr/lib/python3/dist-packages/mot/lib/cl_function.py", line 6, in <module>
from mot.configuration import CLRuntimeInfo
File "/usr/lib/python3/dist-packages/mot/configuration.py", line 20, in <module>
from .lib.cl_environments import CLEnvironmentFactory
File "/usr/lib/python3/dist-packages/mot/lib/cl_environments.py", line 177, in <module>
_cl_environment_cache = _initialize_cl_environment_cache()
File "/usr/lib/python3/dist-packages/mot/lib/cl_environments.py", line 166, in _initialize_cl_environment_cache
context = cl.Context(devices)
pyopencl.RuntimeError: Context failed: device not available
But when you run mdt-list-devices
robbert-harms commented
Closed by adding a specific nvidia dockerfile. Thank you for providing the dockerfile.