/ofed-docker

Primary LanguageShellOtherNOASSERTION

License Build Status

Containerized Nvidia Mellanox drivers

This repository provides means to build driver containers for various distributions.

Driver containers offered:

  • Mellanox OFED driver container : Mellanox out of tree networking driver
  • NV Peer Memory driver container : Nvidia Peer memory client driver for GPU-Direct

What are driver containers ?

Driver containers are containers that allow provisioning of a driver on the host. They provide several benefits over a standard driver installation, for example:

  • Ease of deployment
  • Fast installation

Containerized Mellanox OFED driver

This container is intended to be used as an alternative to host installation by simply deploying the container image on the host the container will:

  • Reload Kernel modules provided by Mellanox OFED
  • Mount the container's root fs to /run/mellanox/drivers/. Should this directory be mapped to the host, the content of this container will be made available to be shared with host or other containers. A use-case for it would be compilation of Nvidia Peer Memory client modules.

Containerized Mellanox OFED - Image build

It is required to build the image on the same OS and kernel as it will be deployed.

The provided Dockerfiles provide several build arguments to provide the flexibility to build a container image for various driver version and platforms.

Build arguments

  • D_OFED_VERSION : Mellanox OFED version as appears in Mellanox OFED download page, e.g 5.0-2.1.8.0
  • D_OS : Operating System version as appears in Mellanox OFED downlload page, e.g ubuntu20.04
  • D_ARCH: CPU architecture as appears in Mellanox OFED download page, e.g x86_64
  • D_BASE_IMAGE : Base image to be used for driver container image build. Default: ubuntu:20.04

Build - Ubuntu

# docker build -t ofed-driver \
--build-arg D_BASE_IMAGE=ubuntu:20.04 \
--build-arg D_OFED_VERSION=5.0-2.1.8.0 \
--build-arg D_OS=ubuntu20.04 \
--build-arg D_ARCH=x86_64 \
ubuntu/

Build - Centos

Coming soon...

Containerized Mellanox OFED - Run

# docker run --rm -it \
-v /run/mellanox/drivers:/run/mellanox/drivers:shared \
-v /etc/network:/etc/network \
-v /etc:/host/etc \
-v /lib/udev:/host/lib/udev \
--net=host --privileged ofed-driver

Containerized Nvidia Peer Memory Client driver

This container is intended to be used as an alternative to host installation by simply deploying the container image on the host the container will:

  • Compile nv_peer_mem kernel module
  • Reload nv_peer_mem kernel module

As Nvidia peer memory client module requires to be compiled against Mellanox OFED and Nvidia drivers currently installed on the machine, it expects the root fs where Mellanox OFED drivers are installed to be mounted at /run/mellanox/drivers And the root fs where Nvidia drivers are installed to be mounted at /run/nvidia/drivers.

This is best suited when both Mellanox NIC and Nvidia GPU drivers are provisioned via driver containers as they offer to expose their container rootfs.

Containerized Nvidia Peer Memory Client driver - Image build

Build arguments

  • D_BASE_IMAGE Base image to be used when building the container image (Default: ubuntu:20.04)
  • D_NV_PEER_MEM_BRANCH Branch/Tag of nv_peer_memory repositroy (Default: master)

Build - Ubuntu

# docker build -t nv-peer-mem \
--build-arg D_BASE_IMAGE=ubuntu:20.04 \
--build-arg D_NV_PEER_MEM_BRANCH=1.0-9 \
gpu-direct/ubuntu/

Build - Centos

Coming soon...

Containerized Nvidia Peer Memory Client driver - Run

In the example below, Mellanox driver container rootfs is mounted on the host at /run/mellanox/drivers and Nvidia driver container rootfs is mounted on the host at /run/nvidia/driver

# docker run --rm -it \
-v /run/mellanox/drivers:/run/mellanox/drivers \
-v /run/nvidia/driver:/run/nvidia/drivers \
--privileged nv-peer-mem

Driver container readiness

A driver container load kernel modules into the running kernel preceded by a possible compilation step.

The process is not atomic as:

  1. A driver is often composed of multiple modules which are loaded sequentially into the kernel.
  2. Compilation (if it takes place) takes time.

To mark the completion of the driver loading phase by the driver container, a file is created at the container's root directory: /.driver-ready. Its existence indicates that the driver has been successfully loaded into the running kernel. This can be used by a container orchestrator to probe for readiness of a driver container.

Limitations

Having rdma-core package installed on the host may prevent Mellanox OFED driver container to properly load drivers. This is due to the fact that rdma-core places udev rules that trigger driver module load from the host as well as load storage modules on system startup.