Mellanox/k8s-rdma-shared-dev-plugin

Question about how does rdma-device-plugin mount infiniband driver to the container ?

Closed this issue · 6 comments

Hi, we install rdma-device-plugin on our clusters and we find that the infiniband driver is mounted to our container in /dev/infiniband once we specify rdma resource in our pod resources.

But I am curious how rdma-device-plugin mounts infiniband driver to the container ?

I am looking at the Allocate Implementation https://github.com/hustcat/k8s-rdma-device-plugin/blob/bd51ac30c8a6f5958cb66d9edd826f7584d50744/server.go#L176-L220

Only devicesList is set in the response

response.ContainerResponses = append(response.ContainerResponses, &pluginapi.ContainerAllocateResponse{
			Devices: devicesList,
		})

Container response is defined as the following. If Mounts is not in the above pluginapi.ContainerAllocateResponse how could driver is mounted to the container ?

type ContainerAllocateResponse struct {
	// List of environment variable to be set in the container to access one of more devices.
	Envs map[string]string `` /* 149-byte string literal not displayed */
	// Mounts for the container.
	Mounts []*Mount `protobuf:"bytes,2,rep,name=mounts,proto3" json:"mounts,omitempty"`
	// Devices for the container.
	Devices []*DeviceSpec `protobuf:"bytes,3,rep,name=devices,proto3" json:"devices,omitempty"`
	// Container annotations to pass to the container runtime
	Annotations          map[string]string `` /* 163-byte string literal not displayed */
	XXX_NoUnkeyedLiteral struct{}          `json:"-"`
	XXX_sizecache        int32             `json:"-"`
}

Devices []*DeviceSpec

this one allows device plugin to provide paths of device files for container runtime to mount to the container.

@adrianchiris Thanks, then we don't need to mount /dev/infiniband in our pod.

@adrianchiris Hi, I use this issue to ask a relative question.
I am curious why you allocate all RDMA devices to a container who request any number of resources of RDMA?
I mean why don't you just allocate the number of device to a container who request the number of resource of RDMA?

Looking forward to your reply. Thanks.

this device plugin is exposing shared resources, the number of resources doesnt have much meaning other to conform to how device plugin resources are modeled in k8s today.

@adrianchiris Thanks for your reply.

So this repository's function is to make Pod(Container) request the resource( hca_shared_devices_a as example) will mount the device as the configured( ib0, ib1 as example) in the machine?

Then what do you mean "shared" in your reply or in other many places like the repository name "k8s-rdma-shared-dev-plugin"? Does the "shared" mean that every Pod(Container) request the resource will mount the device? All of them will have the RDMA ability? So they share the RDMA device?

yes they all will mount the same device resources.