P1-Blocker: Can not run QAT workload in the non-priviledged container
vbedida79 opened this issue ยท 5 comments
Update the issue according to the comments from @vbedida79 @mregmi @mythi
Summary
To make a containerized QATlib-based app run in the OCP, we have to run it as the privileged container. However the privileged container is not allowed for user application workload for the security risk in OCP.
Detail
In OCP Containerized environment. When run the test case from cpa_sample_code
from qatlib to validate the functionality on OCP using a non privileged container, the tests fail during the initialization as it cannot allocate memory. All the tests pass when used in a privileged container.
dma_map_slab:200 VFIO_IOMMU_MAP_DMA failed va=7f8709cca000 iova=200000 size=200000 -- errno=12
[error] SalCtrl_ServiceInit() - : Failed to initialise all service instances
[error] SalCtrl_ServiceEventStart() - : Private data is NULL
qaeMemInit started
ADF_UIO_PROXY err: adf_init_ring: unable to get ringbuf(v:(nil),p:(nil)) for rings in bank(0)
ADF_UIO_PROXY err: icp_adf_transCreateHandle: adf_init_ring failed
ADF_UIO_PROXY err: adf_user_subsystemInit: Failed to initialise Subservice SAL
ADF_UIO_PROXY err: adf_user_subsystemStart: Failed to start Subservice SAL
ADF_UIO_PROXY err: icp_adf_subsystemUnregister: Failed to shutdown subservice SAL.
quickassist/lookaside/access_layer/src/sample_code/performance/cpa_sample_code_main.c, main():479 Could not start sal for user space
[error] SalCtrl_AdfServicesStartedCheck() - : Sal Ctrl failed to start in given time
root case
in OCP IPC_LOCK capability needed in the SCC(security context constraints) to enable the DMA from userspace for QAT VFIO device
Solution
-
Looks like we can't use predefined OCP SCC
-
We might need to define and maintain a reference user defined SCC and add the IPC_LOCK capability.
-
We must be very careful to define and maintain a reference user SCC .
-
So we might need to create our own SCC for QAT workload by starting from the predefined OCP "restricted" or "restricted-v2" and add the IPC_LOCK capability
We are seeing an issue even after using an workaround (setsebool container_use_devices on). It failes to allocation memory during initialization. checking with qatlib team for more details.
[error] SalCtrl_ServiceInit() - : Failed to initialise all service instances
[error] SalCtrl_ServiceEventStart() - : Private data is NULL
qaeMemInit started
ADF_UIO_PROXY err: adf_init_ring: unable to get ringbuf(v:(nil),p:(nil)) for rings in bank(0)
ADF_UIO_PROXY err: icp_adf_transCreateHandle: adf_init_ring failed
ADF_UIO_PROXY err: adf_user_subsystemInit: Failed to initialise Subservice SAL
ADF_UIO_PROXY err: adf_user_subsystemStart: Failed to start Subservice SAL
ADF_UIO_PROXY err: icp_adf_subsystemUnregister: Failed to shutdown subservice SAL.
quickassist/lookaside/access_layer/src/sample_code/performance/cpa_sample_code_main.c, main():479 Could not start sal for user space
[error] SalCtrl_AdfServicesStartedCheck() - : Sal Ctrl failed to start in given time
[error] do_userStart() - : Failed to start services
Mikko:
container runtimes by default limit capabilities given to processes. QAT apps need to add IPC_LOCK capability to deployments in order to DMA from user space.