intel/intel-technology-enabling-for-openshift

P1-Blocker: Can not run QAT workload in the non-priviledged container

vbedida79 opened this issue ยท 5 comments

Update the issue according to the comments from @vbedida79 @mregmi @mythi

Summary

To make a containerized QATlib-based app run in the OCP, we have to run it as the privileged container. However the privileged container is not allowed for user application workload for the security risk in OCP.

Detail

In OCP Containerized environment. When run the test case from cpa_sample_code from qatlib to validate the functionality on OCP using a non privileged container, the tests fail during the initialization as it cannot allocate memory. All the tests pass when used in a privileged container.

dma_map_slab:200 VFIO_IOMMU_MAP_DMA failed va=7f8709cca000 iova=200000 size=200000 -- errno=12
[error] SalCtrl_ServiceInit() - : Failed to initialise all service instances
[error] SalCtrl_ServiceEventStart() - : Private data is NULL
qaeMemInit started
ADF_UIO_PROXY err: adf_init_ring: unable to get ringbuf(v:(nil),p:(nil)) for rings in bank(0)
ADF_UIO_PROXY err: icp_adf_transCreateHandle: adf_init_ring failed
ADF_UIO_PROXY err: adf_user_subsystemInit: Failed to initialise Subservice SAL
ADF_UIO_PROXY err: adf_user_subsystemStart: Failed to start Subservice SAL
ADF_UIO_PROXY err: icp_adf_subsystemUnregister: Failed to shutdown subservice SAL.
quickassist/lookaside/access_layer/src/sample_code/performance/cpa_sample_code_main.c, main():479 Could not start sal for user space
[error] SalCtrl_AdfServicesStartedCheck() - : Sal Ctrl failed to start in given time

root case

in OCP IPC_LOCK capability needed in the SCC(security context constraints) to enable the DMA from userspace for QAT VFIO device

Solution

mregmi commented

We are seeing an issue even after using an workaround (setsebool container_use_devices on). It failes to allocation memory during initialization. checking with qatlib team for more details.

[error] SalCtrl_ServiceInit() - : Failed to initialise all service instances
[error] SalCtrl_ServiceEventStart() - : Private data is NULL
qaeMemInit started
ADF_UIO_PROXY err: adf_init_ring: unable to get ringbuf(v:(nil),p:(nil)) for rings in bank(0)
ADF_UIO_PROXY err: icp_adf_transCreateHandle: adf_init_ring failed
ADF_UIO_PROXY err: adf_user_subsystemInit: Failed to initialise Subservice SAL
ADF_UIO_PROXY err: adf_user_subsystemStart: Failed to start Subservice SAL
ADF_UIO_PROXY err: icp_adf_subsystemUnregister: Failed to shutdown subservice SAL.
quickassist/lookaside/access_layer/src/sample_code/performance/cpa_sample_code_main.c, main():479 Could not start sal for user space
[error] SalCtrl_AdfServicesStartedCheck() - : Sal Ctrl failed to start in given time

 

[error] do_userStart() - : Failed to start services
mregmi commented

Mikko:
container runtimes by default limit capabilities given to processes. QAT apps need to add IPC_LOCK capability to deployments in order to DMA from user space.

Thanks @mythi @mregmi IPC_LOCK with container_device_t selinux option works for qatlib sample

mythi commented

Thanks @mythi @mregmi IPC_LOCK with container_device_t selinux option works for qatlib sample

Does it also mean that no new selinux configs are needed?

Does it also mean that no new selinux configs are needed?
Yes, tried without also container_device_t. IPC_LOCK was the only one missing. @mregmi please correct if I am wrong