P1-Blocker: GPU workload can not access the GPU devices from the Container environment without setsebool container_use_devices on
vbedida79 opened this issue ยท 6 comments
updates according to @mregmi @vbedida79's comments
Summary
GPU workload can not access the GPU devices from the Container environment without setsebool container_use_devices on
Detail
GPU Workload pods requesting gpu.intel.com/i915
resource cant be executed- until they have access for /dev/drm on the GPU node.
This can be achieved by setting- setsebool container_use_devices on
on the host node. This is not feasible to implement if a cluster has multiple GPU nodes and this permission has to be set on each node manually.
Root cause
The /dev/drm access permission is not been added to the container_device_t policy so the access of the /dev/drm is blocked by SELinux which makes the workload app in the can't access the GPU device node files from the container environment.
Solution
- Work with container-selinux upstream to add the needed permission, and make sure the new container-selinux with the fixing got merged into OCP release.
- Before it is merged into OCP release, we have to distribute this new policy through user-container-policy project.
Workaround
To ensure all GPU workloads (clinfo, AI inference) work properly, please run the following command on the GPU nodes.
- Find all nodes with an Intel Data Center GPU card using the following command:
$ oc get nodes -l intel.feature.node.kubernetes.io/gpu=true
Example output:
NAME STATUS ROLES AGE VERSION
icx-dgpu-1 Ready worker 30d v1.25.4+18eadca
- Navigate to the node terminal on the web console (Compute -> Nodes -> Select a node -> Terminal). Run the following commands in the terminal. Repeat step 2 for any other nodes with an Intel Data Center GPU card.
$ chroot /host
$ setsebool container_use_devices on
@vbedida79 which milestone is this targeted for, please add the milestone. Thanks!
Solution with container_device_t option, fails with
type=AVC msg=audit(1693240354.718:400): avc: denied { getattr } for pid=1634424 comm="ls" path="/dev/sda1" dev="devtmpfs" ino=65545 scontext=system_u:system_r:container_device_t:s0:c17,c28 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file permissive=0 type=AVC msg=audit(1693240354.718:401): avc: denied { getattr } for pid=1634424 comm="ls" path="/dev/sda" dev="devtmpfs" ino=65544 scontext=system_u:system_r:container_device_t:s0:c17,c28 tcontext=system_u:object_r:fixed_disk_device_t:s0 tclass=blk_file permissive=0 type=AVC msg=audit(1693240354.718:402): avc: denied { getattr } for pid=1634424 comm="ls" path="/dev/log" dev="devtmpfs" ino=22631 scontext=system_u:system_r:container_device_t:s0:c17,c28 tcontext=system_u:object_r:devlog_t:s0 tclass=lnk_file permissive=0 type=AVC msg=audit(1693240473.700:403): avc: denied { map } for pid=1637667 comm="clinfo" path="/dev/dri/renderD128" dev="devtmpfs" ino=360931 scontext=system_u:system_r:container_device_t:s0:c17,c28 tcontext=system_u:object_r:dri_device_t:s0 tclass=chr_file permissive=0
We need to check for right permissions, or missing options in custom SCC. For v1.0.1, this PR workaround needs to be followed- #104
The device plugins give access to the devices the workload requested. So workload can run without custom policy/permissions. But on the node SELinux blocks access anyway because it has no concept of containers. That's why they designed that setsebool container_use_devices . So for plugin workloads to have access to the devices plugins exported, The Node must have container_use_devices set to on. its off by default. The alternative is to use custom Selinux Label and SCC.
@rhatdan We have a question about the how the SELinux label translation between device files volume mounted in a pod and on the node. We have a workload pod that requests resource from device plugin and as the result that device is shared with the workload. The device file is visible to the pod as container_file_t and on the Node its device_t
. The pod can access the file without getsebool container_use_devices on
. The translation works in this case.
But in another case (GPU), the device file visible to the pod is container_file_t
and on the node its dri_device_t
. But in this case SELinux blocks the access and we see the following error.
Is this a bug that GPU devices dri_device_t
are treated differently.
type=AVC msg=audit(1694192866.178:142): avc: denied { read write } for pid=2741395 comm="clinfo" name="renderD128" dev="devtmpfs" ino=124588 scontext=system_u:system_r:container_t:s0:c17,c28 tcontext=system_u:object_r:dri_device_t:s0 tclass=chr_file permissive=0
Thanks.
We are currently testing a fix from Dan. containers/container-selinux#268
This issue is fixed in 1.2.0 release https://github.com/intel/intel-technology-enabling-for-openshift/releases/tag/v1.2.0