Assistance Needed: Getting NVIDIA GPU and Docker Runtime Working on Flatcar

Question

Assistance Needed: Getting NVIDIA GPU and Docker Runtime Working on Flatcar

Keithsc opened this issue 3 months ago · 2 comments

Hello Flatcar Team,

I've been using Flatcar Linux for a while now and have recently acquired an NVIDIA Tesla M40 GPU. Having successfully utilized Intel GPUs with Docker on Flatcar, I'm now venturing into the world of NVIDIA for the first time.

I'm running a standalone Flatcar Linux instance without any orchestration. While I managed to install the NVIDIA driver as per the documentation, I am struggling to understand how to set up the NVIDIA Container Toolkit and configure Docker to utilize the GPU for applications.

As a newcomer to NVIDIA GPUs, I'd really appreciate guidance on using the GPU in Docker, particularly for setting up and running an application like open-webui . I believe a comprehensive tutorial or example guide would be beneficial, not just for me, but for others who are also new to Flatcar and NVIDIA.
Current Setup:

Flatcar Version : stable = 3975.2.0
Architecture : amd64
Docker Version : 24.0.9, build 293681613
GPU Model : NVIDIA Tesla M40
NVIDIA Driver Version : 535.104.05 (CUDA Version: 12.2)

What I Have Tried:
Followed existing documentation to install the NVIDIA driver.
Attempted to set up the NVIDIA Container Toolkit but faced challenges with runtime configurations.
Tried running a basic NVIDIA CUDA container.

Example Command and Error Encountered:

When I ran the following command:
docker run --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smi

I received the following error:
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0001] error waiting for container: context canceled

Issues Encountered:
Error messages related to GPU runtime selection when attempting to deploy containers.
Confusion over configuration settings for Docker on Flatcar relating to the NVIDIA runtime.

Request for Help:
Clear guidance or documentation on how to configure Docker to work with NVIDIA GPUs on Flatcar.
A step-by-step example of setting up a containerized application (like open-webui) that can utilize the NVIDIA GPU.
Any additional resources or links that could assist newcomers in setting up NVIDIA GPUs on Flatcar would be greatly appreciated.

Thank you for your help and support!
Keith.

Answer 1 · 2024-09-03T17:17:49.000Z

Save this file https://github.com/flatcar/sysext-bakery/releases/download/latest/nvidia_runtime-v1.16.1-x86-64.raw
as /etc/extensions/nvidia_runtime.raw on your Flatcar node, then restart the node. This will install and enable the nvidia-container-runtime for docker and containerd.

You can then install the GPU operator:

helm install --wait --generate-name \
    -n gpu-operator --create-namespace \
    nvidia/gpu-operator \
	--set driver.enabled=false \
	--set toolkit.enabled=false \

It should pass validation.

This will be added to Flatcar docs very soon.

Answer 2 · 2024-09-04T20:41:22.000Z

I've followed the instructions you provided, and it looks like everything is working as expected now. The sysext installation was straightforward, and after restarting my node, I'm able to run containers with GPU access using "docker run --gpus all nvidia/cuda:12.6.0-base-ubuntu24.04 nvidia-smi". Thanks for all your help with this.