robbert-harms/MDT

GPU not recognized in docker image created using Dockerfile.intel

mastrogiovanni opened this issue · 1 comments

We have a TITAN RTX GPU correctly configured on a Ubuntu 18.04 machine:

$ nvidia-smi

Sat Mar  7 16:41:19 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN RTX           Off  | 00000000:3B:00.0 Off |                  N/A |
| 41%   34C    P8    13W / 280W |    112MiB / 24212MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2004      G   /usr/lib/xorg/Xorg                            39MiB |
|    0      2120      G   /usr/bin/gnome-shell                          70MiB |
+-----------------------------------------------------------------------------+

We use Docker 19.03.7

$ docker version

Client: Docker Engine - Community
 Version:           19.03.7
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        7141c199a2
 Built:             Wed Mar  4 01:22:36 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.7
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       7141c199a2
  Built:            Wed Mar  4 01:21:08 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Docker is correctly configured to support GPU:

$ docker run --gpus all nvidia/cuda nvidia-smi

Sat Mar  7 15:46:14 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN RTX           Off  | 00000000:3B:00.0 Off |                  N/A |
| 41%   34C    P8    13W / 280W |    112MiB / 24212MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

We built the image with the command:

docker build -t mdt -f containers/Dockerfile.intel .

but when we run:

$ docker run --gpus all mdtold mdt-list-devices

we obtain

Traceback (most recent call last):
  File "/usr/bin/mdt-list-devices", line 9, in <module>
    load_entry_point('mdt==1.2.2', 'console_scripts', 'mdt-list-devices')()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 542, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2569, in load_entry_point
    return ep.load()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2229, in load
    return self.resolve()
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 2235, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/usr/lib/python3/dist-packages/mdt/__init__.py", line 8, in <module>
    import mot
  File "/usr/lib/python3/dist-packages/mot/__init__.py", line 3, in <module>
    from .optimize import minimize, get_minimizer_options
  File "/usr/lib/python3/dist-packages/mot/optimize/__init__.py", line 1, in <module>
    from mot.lib.cl_function import SimpleCLFunction
  File "/usr/lib/python3/dist-packages/mot/lib/cl_function.py", line 6, in <module>
    from mot.configuration import CLRuntimeInfo
  File "/usr/lib/python3/dist-packages/mot/configuration.py", line 20, in <module>
    from .lib.cl_environments import CLEnvironmentFactory
  File "/usr/lib/python3/dist-packages/mot/lib/cl_environments.py", line 177, in <module>
    _cl_environment_cache = _initialize_cl_environment_cache()
  File "/usr/lib/python3/dist-packages/mot/lib/cl_environments.py", line 166, in _initialize_cl_environment_cache
    context = cl.Context(devices)
pyopencl.RuntimeError: Context failed: device not available

But when you run mdt-list-devices

Closed by adding a specific nvidia dockerfile. Thank you for providing the dockerfile.