containers/container-selinux

dri_device_t cannot be accessed correctly by pods using device plugins.

Closed this issue · 12 comments

SELinux label translation between device files made available by device plugins does not work for GPU (dri_device_t) . We have a workload pod that requests resource from device plugin and as the result that device is shared with the workload. The device file is visible to the pod as container_file_t and on the Node its device_t . The pod can access the file without getsebool container_use_devices on. The translation works in this case.
But in another case (GPU), the device file visible to the pod is container_file_t and on the node its dri_device_t. But in this case SELinux blocks the access and we see the following error.

type=AVC msg=audit(1694192866.178:142): avc: denied { read write } for pid=2741395 comm="clinfo" name="renderD128" dev="devtmpfs" ino=124588 scontext=system_u:system_r:container_t:s0:c17,c28 tcontext=system_u:object_r:dri_device_t:s0 tclass=chr_file permissive=0

Tried #268 but still having issues. It is now complaining about Open.

type=AVC msg=audit(1694800707.873:1495): avc: denied { open } for pid=1073469 comm="clinfo" path="/dev/dri/by-path/pci-0000:37:00.0-render" dev="devtmpfs" ino=124588 scontext=system_u:system_r:container_t:s0:c17,c28 tcontext=system_u:object_r:dri_device_t:s0 tclass=chr_file permissive=0

Yup, that is what I thought would be an issue.

The question I have is should this be a Boolean or be allowed by default. The issue is opening the risk for all containers to be able to use dri devices. I think it is best to only allow access via a Boolean.

My previous attempt was thinking that perhaps some other process had opened the device and allow the containers to read/write them. Allowing containers to open the device along with read/write basically removes the ability to control containers from accessing DRI devices, if they escaped containerization.

So i am still confused that this issue is not seen in our other device categories where its labelled device_t and is very specific to dri_device_t. The other devices work exactly the same way and just the label is different.

@rhatdan So we have 2 other devices which is labelled as follows:
sh-5.1# ls -lZ /dev/sgx_provision
crw-------. 1 root root system_u:object_r:device_t:s0 10, 126 Oct 26 15:06 /dev/sgx_provision

sh-5.1# ls -lZ /dev/vfio
total 0
crw-rw----. 1 root hugetlbfs system_u:object_r:vfio_device_t:s0 235, 0 Oct 26 15:10 436

The workload containers have no issue accessing these files. But the problem is only while accessing /dev/dri/* which is labelled as dri_device_t.

Do they have different set of policies?
The boolean method works but the customers needs to set these in all the nodes in cluster. I am just trying to make sure i am not missing anything as the other two devices does not have access issues.

Are you using Podman on CRIO Within the Cluster (I am assuming cluster ==Kubernetes/OpenShift)?

Yes we are using OpenShift cluster which uses CRIO.

And then executing rootless podman within it?

Yes the pod is rootless but we do not directly use podman. We just use oc apply to run the pod.

The device node is made available by intel device plugin operator to the workload pod. I think it uses the 'device' field of OCI runtime.
https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md#devices

We experimented a bit and re-labelled the device file in question to device_t. but still got the issue. So there might be something else going on when Plugin is requesting the resource. We will look into plugin code to verify.

We have narrowed down the issue to a symbolic link. When the device is accessed directly (/dev/dri/card0) there is no issue but if the access is done via symbolic link (/dev/dri/by-path/card0), the access is denied. After looking at the device files made available to the pod. it looks like its not container_file_t but dri_device_t

Not sure why the labels in by-path are not translated.
Here is how the devices show up as in host and the workload container.

Host Node:

sh-5.1# ls -lZ /dev/dri
total 0
drwxr-xr-x. 2 root root   system_u:object_r:device_t:s0          140 Oct 31 04:50 by-path
crw-rw----. 1 root video  system_u:object_r:dri_device_t:s0 226,   0 Oct 31 03:51 card0
crw-rw----. 1 root video  system_u:object_r:dri_device_t:s0 226,   1 Oct 31 04:50 card1
crw-rw----. 1 root video  system_u:object_r:dri_device_t:s0 226,   2 Oct 31 04:50 card2
crw-rw-rw-. 1 root render system_u:object_r:dri_device_t:s0 226, 128 Oct 31 04:50 renderD128
crw-rw-rw-. 1 root render system_u:object_r:dri_device_t:s0 226, 129 Oct 31 04:50 renderD129

sh-5.1# ls -lZ /dev/dri/by-path/
total 0
lrwxrwxrwx. 1 root root system_u:object_r:device_t:s0  8 Oct 31 03:51 pci-0000:02:00.0-card -> ../card0
lrwxrwxrwx. 1 root root system_u:object_r:device_t:s0  8 Oct 31 04:50 pci-0000:37:00.0-card -> ../card1
lrwxrwxrwx. 1 root root system_u:object_r:device_t:s0 13 Oct 31 04:50 pci-0000:37:00.0-render -> ../renderD128
lrwxrwxrwx. 1 root root system_u:object_r:device_t:s0  8 Oct 31 04:50 pci-0000:3c:00.0-card -> ../card2
lrwxrwxrwx. 1 root root system_u:object_r:device_t:s0 13 Oct 31 04:50 pci-0000:3c:00.0-render -> ../renderD129

Workload Pod:

sh-4.4$ ls -laZ /dev/dri   
total 0
drwxr-xr-x. 3 root root  system_u:object_r:container_file_t:s0:c27,c28      140 Nov  8 22:42 .
drwxr-xr-x. 6 root root  system_u:object_r:container_file_t:s0:c27,c28      380 Nov  8 22:42 ..
drwxr-xr-x. 2 root root  system_u:object_r:container_file_t:s0:c27,c28      120 Nov  8 22:42 by-path
crw-rw-rw-. 1 root video system_u:object_r:container_file_t:s0:c27,c28 226,   1 Nov  8 22:42 card1
crw-rw-rw-. 1 root video system_u:object_r:container_file_t:s0:c27,c28 226,   2 Nov  8 22:42 card2
crw-rw-rw-. 1 root   797 system_u:object_r:container_file_t:s0:c27,c28 226, 128 Nov  8 22:42 renderD128
crw-rw-rw-. 1 root   797 system_u:object_r:container_file_t:s0:c27,c28 226, 129 Nov  8 22:42 renderD129
sh-4.4$ ls -laZ /dev/dri/by-path/
total 0
drwxr-xr-x. 2 root root  system_u:object_r:container_file_t:s0:c27,c28      120 Nov  8 22:42 .
drwxr-xr-x. 3 root root  system_u:object_r:container_file_t:s0:c27,c28      140 Nov  8 22:42 ..
crw-rw----. 1 root video system_u:object_r:dri_device_t:s0             226,   1 Oct 31 04:50 pci-0000:37:00.0-card
crw-rw-rw-. 1 root   797 system_u:object_r:dri_device_t:s0             226, 128 Oct 31 04:50 pci-0000:37:00.0-render
crw-rw----. 1 root video system_u:object_r:dri_device_t:s0             226,   2 Oct 31 04:50 pci-0000:3c:00.0-card
crw-rw-rw-. 1 root   797 system_u:object_r:dri_device_t:s0             226, 129 Oct 31 04:50 pci-0000:3c:00.0-render

@mythi

Yes the pod is rootless but we do not directly use podman.

I believe there's a conflict here. The pod containers are only run as a non-root user (non-zero uid/gid), but the runtime stack is not rootless

After looking at the device files made available to the pod. it looks like its not container_file_t but dri_device_t

FWIW, the GPU plugin sets the symlinks via mounts[] as specified by the device plugin API.