moby/moby

NVIDIA GPU support

3XX0 opened this issue · 39 comments

3XX0 commented

Hello, author of nvidia-docker here.

As many of you may know, we recently released our NVIDIA Docker project in our effort to enable GPUs in containerized applications (mostly Deep Learning). This project is currently comprised of two parts:

  • A Docker volume plugin to mount NVIDIA driver files inside containers at runtime.
  • A small wrapper around Docker to ease the deployment of our images.

More information on this here

While it has been working great so far, now that Docker 1.12 is coming out with configurable runtime and complete OCI support, we would like to move away from this approach (which is admittedly hacky) and work on something which is better integrated with Docker.

The way I see it would be to provide a prestart OCI hook which would effectively trigger our implementation and configure the cgroups/namespaces correctly.

However, there are several things we need to solve first, specifically:

  1. How to detect if a given image needs GPU support
    Currently, we are using a special label com.nvidia.volumes.needed, but it is not exported as an OCI annotation (see #21324)
  2. How to pass down to the hook which GPU should be isolated
    Currently, we are using an environment variable NV_GPU
  3. How to check whether the image is compatible with the current driver or not.
    Currently, we are using a special label XXX_VERSION

All of the above could be solved using environment variables but I'm not particularly fond of this idea (e.g. docker run -e NVIDIA_GPU=0,1 nvidia/cuda)

So is there a way to pass runtime/hook parameters from the docker command line and if not would it be worth it? (e.g. --runtime-opt)

flx42 commented

+@cpuguy83 we briefly discussed that with you during DockerCon 16.

3XX0 commented

Worth noting that opencontainers/runtime-spec#483 might affect this

@3XX0 I guess another option would be to have a patched version of runc that has different options and knows about GPUs?

The image detection is a similar issue we have for multiarch in general, and there are flags on hub, but not well exposed yet. This might also work for driver version, let me find the spec.

3XX0 commented

Yes I thought about it too but I would rather not have to track upstream runc and we would still hit the same problems since the runtime config in Docker is daemon-wide and static.

Thanks, It would be greatly appreciated.

flx42 commented

I just saw that #24750 was closed and redirected here.
I believe we could already have basic GPU support with Docker Swarm if we were able to add devices when using docker service create. Is it on the roadmap?

Related: NVIDIA/nvidia-docker#141

@3XX0 @flx42 I am the original author of #24750 . I am not sure this question is appropriate to post, if not, please forgive me. If I just want to implement Docker Swarm to orchestrate Swarm cluster supporting GPU, not caring about whether to use native Docker or nvidia-docker, could you give some comments or suggestions here? Thanks in advance!

@flx42 I think you may be able to bind mount in device nodes with docker service --mount but it is not very well documented yet as I think the CLI is still being finalised; I am fairly sure the API allows bind mounts though.

--mount type=bind,source=/host/path,target=/container/path

flx42 commented

@justincormack @cpuguy83 Yes, in NVIDIA/nvidia-docker#141 I figured out I can mount the user-level driver files like this:

$ docker service create --mount type=volume,source=nvidia_driver_367.35,target=/usr/local/nvidia,volume-driver=nvidia-docker [...]

But, unless I'm missing something, you can't bind mount a device, it seems to be like a mknod but without the proper device cgroup whitelisting.

$ docker service create --mount type=bind,source=/dev/nvidiactl,target=/dev/nvidiactl ubuntu:14.04 sh -c 'echo foo > /dev/nvidiactl'
$ docker logs stupefied_kilby.1.2445ld28x6ooo0rjns26ezsfg
sh: 1: cannot create /dev/nvidiactl: Operation not permitted

It's probably similar to doing something like this:

docker run -ti ubuntu:14.04                      
root@76d4bb08b07c:/# mknod -m 666 /dev/nvidiactl c 195 255
root@76d4bb08b07c:/# echo foo > /dev/nvidiactl
bash: /dev/nvidiactl: Operation not permitted

Whereas the following works (well, invalid arg is normal):

$ docker run -ti --device /dev/nvidiactl ubuntu:14.04
root@ea53a1b96226:/# echo foo > /dev/nvidiactl
bash: echo: write error: Invalid argument
flx42 commented

@NanXiao regarding your question, please look at NVIDIA/nvidia-docker#141

@flx42 ah yes, that would be an issue. Can you create a separate issue that you can't have a device in a service to track that specific problem?

flx42 commented

@justincormack created #24865, thank you!

Few comments

How to detect if a given image needs GPU support

The way we handled this for multi-arch is by explicitly introducing the "arch" filed into the image. I would suggest that we introduce an "accelerator" field to address not only GPUs but in the future for FPGA and other accelerator.

On the compatibility check, I would implement it such that it is optional. A lot of applications can run with or without GPUs, if GPUs are there, they will take advantage of them but if they are not there, they will just run CPU only mode. If we make the driver check optional, it will make it easy to accommodate this requirement.

Any update here? really look forward to a standard way to use accelerators in container :)

icy commented

See also kubernetes/kubernetes#19049 . k8s is going to release new version with GPU support.

Swarm is very good for our system (k8s has something we don't need). However, GPU is defintely a key feature and if Swarm doesn't have any clear plan for it we have to go with k8s :D

Hey guys, would really love to use a GPU supported swarm. This issue is still open, so I guess its not clear if this will be on the roadmap or not?! Any news on this topic?

thx for the update @justincormack

@3XX0 @flx42

What are the low level steps to give a container access to the GPUs? I was looking though your codebase and seen there are devices that must be added and volumes but I was not sure what the volumes were being used for? Is that for the libs?

Other than placing the devices inside, are there any other settings that need to be applied?

3XX0 commented

It's actually tricky, we explain most of it here but there are a lot of corner-cases to think about.

We've been working on a library to abstract all these things, the idea is to integrate it as a RunC prestart hook. We have most of it working now and will publish it soon. The only issue with this approach is that we have to fork RunC and rely on Docker --runtime option.

In the long term, we should probably think about a native GPU integration leveraging something like --exec-opt.

@3XX0 if you want, we can work together on the runc changes together. I think it makes sense to have some support for GPU at that level and will make implementations much cleaner at the higher levels.

3XX0 commented

It should be pretty straightforward, the only things that need to be done are:

  1. Fix this opencontainers/runc#1044
  2. Append some hooks in the spec

Once 1) is fixed and we have an option to add custom hooks from within Docker (e.g. exec-opt), we won't need the fork anymore (except for backward compatibility)

@3XX0 why use hooks at all? Can you not populate everything in the spec itself to add the devices, add bind mounts, and give the correct permissions on the device level?

@3XX0 if we have a docker run --gpus what would the input data look like?

3XX0 commented

Our solution is runtime agnostic and hooks are perfect for that.
We also need to do more than what's exposed by the spec (e.g. update the library cache).

3XX0 commented

Right now we use an environment variable with a list of comma-separated IDs and/or UUIDs (similar to nvidia-docker NV_GPU). It allows us to be backward compatible with all the Docker versions, encode the GPUs required by a given image (i.e. ENV NVIDIA_GPU=any or ENV NVIDIA_GPU=all) and we can override them on the command line:

docker run --runtime=nvidia -e NVIDIA_GPU=0,1 nvidia/cuda

3XX0 commented

@crosbymichael We also hit some limitations with Docker, for example a lot of GPU images need large shmsize and memlock limits (our drivers need those). Not sure how to address that at the image level (Docker is not even relying on the OCI spec for /dev/shm).

The workaround is to configure everything at the daemon level once we have #29492 fixed but it's far from being ideal.

3XX0 commented

@justincormack @crosbymichael Do you have a timeline on the containerd 1.0 release and integration in Docker?

Right now the only option we have is to integrate at the runc level given that the containerd "runtime" is hardcoded. I would rather do it with containerd 1.0 if Docker were to support it.

@3xxo containerd 1.0 is now integrated into Docker.

@3XX0 thanks for your excellent works!
I got some questions about nvidia-docker.
for example, assuming a PC got tow nvidia gpu for GPU A and GPU B. And index of GPU A is 0, index of GPU B is 1.when I execute command as follows:

NV_GPU='0' nvidia-docker run -d nignx

Can this container use GPU B? By your comments above , it should not.
Since my pc got only one nvidia GPU ,so I can't make a try to confirm this.
However, I have tried this, run following commands respectively.

NV_GPU='0' nvidia-docker run -d nginx

docker run -d nginx

And I did not notice anything different between those two containers.
On both containers I can get the information of my GPU.
Did I miss something or it just goes like this.Looking forward your reply,thanks

flx42 commented

@WanLinghao Inside the docker run -d nginx container, you should not see the GPUs, unless you have a special configuration. Can you double-check?

@flx42 I have tried three kinds of command
1. docker run -d nginx
2. docker run --privileged=true -d nginx
3. NV_GPU='0' nvidia-docker run -d nginx

And then I dig into their bashes respectively, execute
find / -name *nvidia*
I got results as follows:

first container and third container:

/sys/bus/pci/drivers/nvidia
/sys/kernel/slab/nvidia_pte_cache
/sys/kernel/slab/nvidia_p2p_page_cache
/sys/kernel/slab/nvidia_stack_cache
/sys/module/drm/holders/nvidia_drm
/sys/module/drm_kms_helper/holders/nvidia_drm
/sys/module/nvidia_modeset
/sys/module/nvidia_modeset/holders/nvidia_drm
/sys/module/nvidia
/sys/module/nvidia/drivers/pci:nvidia
/sys/module/nvidia/holders/nvidia_modeset
/sys/module/nvidia/holders/nvidia_uvm
/sys/module/nvidia_drm
/sys/module/nvidia_uvm
/proc/irq/33/nvidia
/proc/driver/nvidia
/proc/driver/nvidia-uvm

second container:

/dev/nvidiactl
/dev/nvidia0
/dev/nvidia-uvm-tools
/dev/nvidia-uvm
/dev/nvidia-modeset
/sys/bus/pci/drivers/nvidia
/sys/kernel/slab/nvidia_pte_cache
/sys/kernel/slab/nvidia_p2p_page_cache
/sys/kernel/slab/nvidia_stack_cache
/sys/module/drm/holders/nvidia_drm
/sys/module/drm_kms_helper/holders/nvidia_drm
/sys/module/nvidia_modeset
/sys/module/nvidia_modeset/holders/nvidia_drm
/sys/module/nvidia
/sys/module/nvidia/drivers/pci:nvidia
/sys/module/nvidia/holders/nvidia_modeset
/sys/module/nvidia/holders/nvidia_uvm
/sys/module/nvidia_drm
/sys/module/nvidia_uvm
/proc/irq/33/nvidia
/proc/driver/nvidia
/proc/driver/nvidia-uvm

As you can see, command 1 and command 3 makes no difference.
Can you give more information about this.
Looking forward your reply, THANKS!

flx42 commented

Ah yes, nvidia-docker (version 1.0) will be passthrough to docker for this nginx image. We enable GPU support only when used with images that are based on our nvidia/cuda images from Docker Hub. We detect if the image has a special label for this purpose.
Note that version 2.0 of nvidia-docker behaves differently (documented on our README).

So, try again with nvidia/cuda as the image.

Any news on this? Would ❤️ a LinuxKit distro with nvidia-docker onboard 🙌✨
//cc @3XX0 @justincormack

flx42 commented

Trying to revive the interest for supporting OCI hooks here: #36987
If there is interest, we can close this issue once it's implemented.

Docker 19.03 has docker run --gpus. Closing this issue.