anibali/docker-pytorch

cuda:10 and cuda:9

TJJTJJTJJ opened this issue · 2 comments

My computer(host) has installed cuda 9.0, and I also have installed NVIDIA Container Toolkit.

And when I use your docker anibali/pytorch:cuda-9.0, it's OK.

root@zbp-PowerEdge-T630:~# docker run -it --gpus all anibali/pytorch:cuda-9.0 /bin/bash
user@91fafcb855d7:/app$ python
Python 3.6.5 |Anaconda, Inc.| (default, Apr 29 2018, 16:14:56) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> 

But, when I use your docker anibali/pytorch:cuda-10.0, it's failed.

root@zbp-PowerEdge-T630:~# docker run -it --gpus all anibali/pytorch:cuda-10.0 /bin/bash
docker: Error response from daemon: OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled 

In my opinion, docker pytorch should be independent of my computer' cuda.
Is there something I missed? Thank you sir

You are correct that the version of CUDA installed on the host should not matter. The driver version does matter though---the minimum required for CUDA 10.0 on Linux is 410.48 according to the documentation.

Once you confirm that your driver version is new enough, you should narrow things down to check whether the problem is which this image or not. Can you try to run a plain CUDA 10.0 image to see whether that works?

$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi

Closing due to lack of response.