NVIDIA GPU support
3XX0 opened this issue · 39 comments
Hello, author of nvidia-docker here.
As many of you may know, we recently released our NVIDIA Docker project in our effort to enable GPUs in containerized applications (mostly Deep Learning). This project is currently comprised of two parts:
- A Docker volume plugin to mount NVIDIA driver files inside containers at runtime.
- A small wrapper around Docker to ease the deployment of our images.
More information on this here
While it has been working great so far, now that Docker 1.12 is coming out with configurable runtime and complete OCI support, we would like to move away from this approach (which is admittedly hacky) and work on something which is better integrated with Docker.
The way I see it would be to provide a prestart OCI hook which would effectively trigger our implementation and configure the cgroups/namespaces correctly.
However, there are several things we need to solve first, specifically:
- How to detect if a given image needs GPU support
Currently, we are using a special labelcom.nvidia.volumes.needed
, but it is not exported as an OCI annotation (see #21324) - How to pass down to the hook which GPU should be isolated
Currently, we are using an environment variableNV_GPU
- How to check whether the image is compatible with the current driver or not.
Currently, we are using a special labelXXX_VERSION
All of the above could be solved using environment variables but I'm not particularly fond of this idea (e.g. docker run -e NVIDIA_GPU=0,1 nvidia/cuda
)
So is there a way to pass runtime/hook parameters from the docker command line and if not would it be worth it? (e.g. --runtime-opt
)
ping @mlaventure @crosbymichael wdyt?
Worth noting that opencontainers/runtime-spec#483 might affect this
@3XX0 I guess another option would be to have a patched version of runc
that has different options and knows about GPUs?
The image detection is a similar issue we have for multiarch in general, and there are flags on hub, but not well exposed yet. This might also work for driver version, let me find the spec.
Yes I thought about it too but I would rather not have to track upstream runc
and we would still hit the same problems since the runtime config in Docker is daemon-wide and static.
Thanks, It would be greatly appreciated.
I just saw that #24750 was closed and redirected here.
I believe we could already have basic GPU support with Docker Swarm if we were able to add devices when using docker service create
. Is it on the roadmap?
Related: NVIDIA/nvidia-docker#141
@3XX0 @flx42 I am the original author of #24750 . I am not sure this question is appropriate to post, if not, please forgive me. If I just want to implement Docker Swarm
to orchestrate Swarm cluster
supporting GPU
, not caring about whether to use native Docker
or nvidia-docker
, could you give some comments or suggestions here? Thanks in advance!
@flx42 I think you may be able to bind mount in device nodes with docker service --mount
but it is not very well documented yet as I think the CLI is still being finalised; I am fairly sure the API allows bind mounts though.
--mount type=bind,source=/host/path,target=/container/path
@cpuguy83 thanks!
@justincormack @cpuguy83 Yes, in NVIDIA/nvidia-docker#141 I figured out I can mount the user-level driver files like this:
$ docker service create --mount type=volume,source=nvidia_driver_367.35,target=/usr/local/nvidia,volume-driver=nvidia-docker [...]
But, unless I'm missing something, you can't bind mount a device, it seems to be like a mknod
but without the proper device cgroup whitelisting.
$ docker service create --mount type=bind,source=/dev/nvidiactl,target=/dev/nvidiactl ubuntu:14.04 sh -c 'echo foo > /dev/nvidiactl'
$ docker logs stupefied_kilby.1.2445ld28x6ooo0rjns26ezsfg
sh: 1: cannot create /dev/nvidiactl: Operation not permitted
It's probably similar to doing something like this:
docker run -ti ubuntu:14.04
root@76d4bb08b07c:/# mknod -m 666 /dev/nvidiactl c 195 255
root@76d4bb08b07c:/# echo foo > /dev/nvidiactl
bash: /dev/nvidiactl: Operation not permitted
Whereas the following works (well, invalid arg is normal):
$ docker run -ti --device /dev/nvidiactl ubuntu:14.04
root@ea53a1b96226:/# echo foo > /dev/nvidiactl
bash: echo: write error: Invalid argument
@NanXiao regarding your question, please look at NVIDIA/nvidia-docker#141
@flx42 ah yes, that would be an issue. Can you create a separate issue that you can't have a device in a service to track that specific problem?
@justincormack created #24865, thank you!
Few comments
How to detect if a given image needs GPU support
The way we handled this for multi-arch is by explicitly introducing the "arch" filed into the image. I would suggest that we introduce an "accelerator" field to address not only GPUs but in the future for FPGA and other accelerator.
On the compatibility check, I would implement it such that it is optional. A lot of applications can run with or without GPUs, if GPUs are there, they will take advantage of them but if they are not there, they will just run CPU only mode. If we make the driver check optional, it will make it easy to accommodate this requirement.
Any update here? really look forward to a standard way to use accelerators in container :)
See also kubernetes/kubernetes#19049 . k8s is going to release new version with GPU support.
Swarm is very good for our system (k8s has something we don't need). However, GPU is defintely a key feature and if Swarm doesn't have any clear plan for it we have to go with k8s :D
Hey guys, would really love to use a GPU supported swarm. This issue is still open, so I guess its not clear if this will be on the roadmap or not?! Any news on this topic?
thx for the update @justincormack
What are the low level steps to give a container access to the GPUs? I was looking though your codebase and seen there are devices that must be added and volumes but I was not sure what the volumes were being used for? Is that for the libs?
Other than placing the devices inside, are there any other settings that need to be applied?
It's actually tricky, we explain most of it here but there are a lot of corner-cases to think about.
We've been working on a library to abstract all these things, the idea is to integrate it as a RunC prestart hook. We have most of it working now and will publish it soon. The only issue with this approach is that we have to fork RunC and rely on Docker --runtime
option.
In the long term, we should probably think about a native GPU integration leveraging something like --exec-opt
.
@3XX0 if you want, we can work together on the runc changes together. I think it makes sense to have some support for GPU at that level and will make implementations much cleaner at the higher levels.
It should be pretty straightforward, the only things that need to be done are:
- Fix this opencontainers/runc#1044
- Append some hooks in the spec
Once 1) is fixed and we have an option to add custom hooks from within Docker (e.g. exec-opt
), we won't need the fork anymore (except for backward compatibility)
@3XX0 why use hooks at all? Can you not populate everything in the spec itself to add the devices, add bind mounts, and give the correct permissions on the device level?
@3XX0 if we have a docker run --gpus
what would the input data look like?
Our solution is runtime agnostic and hooks are perfect for that.
We also need to do more than what's exposed by the spec (e.g. update the library cache).
Right now we use an environment variable with a list of comma-separated IDs and/or UUIDs (similar to nvidia-docker NV_GPU
). It allows us to be backward compatible with all the Docker versions, encode the GPUs required by a given image (i.e. ENV NVIDIA_GPU=any
or ENV NVIDIA_GPU=all
) and we can override them on the command line:
docker run --runtime=nvidia -e NVIDIA_GPU=0,1 nvidia/cuda
@crosbymichael We also hit some limitations with Docker, for example a lot of GPU images need large shmsize and memlock limits (our drivers need those). Not sure how to address that at the image level (Docker is not even relying on the OCI spec for /dev/shm
).
The workaround is to configure everything at the daemon level once we have #29492 fixed but it's far from being ideal.
@justincormack @crosbymichael Do you have a timeline on the containerd 1.0 release and integration in Docker?
Right now the only option we have is to integrate at the runc level given that the containerd "runtime" is hardcoded. I would rather do it with containerd 1.0 if Docker were to support it.
@3xxo containerd 1.0 is now integrated into Docker.
@3XX0 thanks for your excellent works!
I got some questions about nvidia-docker.
for example, assuming a PC got tow nvidia gpu for GPU A and GPU B. And index of GPU A is 0, index of GPU B is 1.when I execute command as follows:
NV_GPU='0' nvidia-docker run -d nignx
Can this container use GPU B? By your comments above , it should not.
Since my pc got only one nvidia GPU ,so I can't make a try to confirm this.
However, I have tried this, run following commands respectively.
NV_GPU='0' nvidia-docker run -d nginx
docker run -d nginx
And I did not notice anything different between those two containers.
On both containers I can get the information of my GPU.
Did I miss something or it just goes like this.Looking forward your reply,thanks
@WanLinghao Inside the docker run -d nginx
container, you should not see the GPUs, unless you have a special configuration. Can you double-check?
@flx42 I have tried three kinds of command
1. docker run -d nginx
2. docker run --privileged=true -d nginx
3. NV_GPU='0' nvidia-docker run -d nginx
And then I dig into their bashes respectively, execute
find / -name *nvidia*
I got results as follows:
first container and third container:
/sys/bus/pci/drivers/nvidia
/sys/kernel/slab/nvidia_pte_cache
/sys/kernel/slab/nvidia_p2p_page_cache
/sys/kernel/slab/nvidia_stack_cache
/sys/module/drm/holders/nvidia_drm
/sys/module/drm_kms_helper/holders/nvidia_drm
/sys/module/nvidia_modeset
/sys/module/nvidia_modeset/holders/nvidia_drm
/sys/module/nvidia
/sys/module/nvidia/drivers/pci:nvidia
/sys/module/nvidia/holders/nvidia_modeset
/sys/module/nvidia/holders/nvidia_uvm
/sys/module/nvidia_drm
/sys/module/nvidia_uvm
/proc/irq/33/nvidia
/proc/driver/nvidia
/proc/driver/nvidia-uvm
second container:
/dev/nvidiactl
/dev/nvidia0
/dev/nvidia-uvm-tools
/dev/nvidia-uvm
/dev/nvidia-modeset
/sys/bus/pci/drivers/nvidia
/sys/kernel/slab/nvidia_pte_cache
/sys/kernel/slab/nvidia_p2p_page_cache
/sys/kernel/slab/nvidia_stack_cache
/sys/module/drm/holders/nvidia_drm
/sys/module/drm_kms_helper/holders/nvidia_drm
/sys/module/nvidia_modeset
/sys/module/nvidia_modeset/holders/nvidia_drm
/sys/module/nvidia
/sys/module/nvidia/drivers/pci:nvidia
/sys/module/nvidia/holders/nvidia_modeset
/sys/module/nvidia/holders/nvidia_uvm
/sys/module/nvidia_drm
/sys/module/nvidia_uvm
/proc/irq/33/nvidia
/proc/driver/nvidia
/proc/driver/nvidia-uvm
As you can see, command 1 and command 3 makes no difference.
Can you give more information about this.
Looking forward your reply, THANKS!
Ah yes, nvidia-docker (version 1.0) will be passthrough to docker
for this nginx
image. We enable GPU support only when used with images that are based on our nvidia/cuda
images from Docker Hub. We detect if the image has a special label for this purpose.
Note that version 2.0 of nvidia-docker behaves differently (documented on our README).
So, try again with nvidia/cuda
as the image.
Any news on this? Would ❤️ a LinuxKit distro with nvidia-docker onboard 🙌✨
//cc @3XX0 @justincormack
Trying to revive the interest for supporting OCI hooks here: #36987
If there is interest, we can close this issue once it's implemented.
Docker 19.03 has docker run --gpus
. Closing this issue.