intel/intel-device-plugins-for-kubernetes

[QAT] Why kernel driver passthrough all uio files

Closed this issue · 9 comments

Hi, I am reading through the implementation of kerneldrv for QAT devices. We may want to use OOT driver due to the kernel version, though it is being on deprecated.

I can see that you mentioned it that for now it will passthrough all UIO interfaces to each container. Can I ask what is the reason for that? Is it possible to only passthrough required UIO interfaces for each endpoint?

Thanks in advance

Is it possible to only passthrough required UIO interfaces for each endpoint?

At the time it was written, all containers needed all UIO interfaces and it was due to how the driver SW stack was implemented. Things may have changed on that front but it does not change the fact that kerneldrv is going to be removed eventually.

Is it possible to only passthrough required UIO interfaces for each endpoint?

At the time it was written, all containers needed all UIO interfaces and it was due to how the driver SW stack was implemented. Things may have changed on that front but it does not change the fact that kerneldrv is going to be removed eventually.

Ok that makes sense.

Btw do you have any other plan to support OOT driver with Gen4 device after kerneldrv deprecated? So far there are still numbers of downstream OSs whose kernel version are falling behind mainstream and as I can see OOT driver is still being actively supported.

No plans. For "QAT lib" based workloads/containers, we cannot support multiple kernel interfaces because that will create a compatibility mess due to the ABI incompatibility.

Thanks for the quick response

@Kewei-Lu I'd like to clarify that "downstream OS" topic also. If I look what what, e.g., Ubuntu 22.04 (their latest HWE kernel is 6.5 and 24.04 will have 6.8), RHEL (e.g., for Openshift), SUSE offer, they are all capable for what our in-tree plugin offers for the qatlib based workloads.

@Kewei-Lu I'd like to clarify that "downstream OS" topic also. If I look what what, e.g., Ubuntu 22.04 (their latest HWE kernel is 6.5 and 24.04 will have 6.8), RHEL (e.g., for Openshift), SUSE offer, they are all capable for what our in-tree plugin offers for the qatlib based workloads.

Yes the latest version of downstream would support almost everything we have in qatlib

While from our practice, some end-users may not be that agile to shift to those more up-to-date infrastructures. As a result, they may still have like ubuntu 18.04, rhel 8.6, etc., for stabilization

For that reason, we want to build something that is independent of that aspect, that is why we are considering using OOT for now.

ubuntu 18.04, rhel 8.6, etc., for stabilization

8.6 supports Gen4 in-tree and 18.04 is EOL. OOT is going to cause you more troubles in the long run.

ubuntu 18.04, rhel 8.6, etc., for stabilization

OOT is going to cause you more troubles in the long run.

could you elaborate the troubles? despite the UIO framework compared with vfio

You end up being stuck maintaining the entire QAT stack on your nodes yourself. We don't offer tools for "driver lifecycle management" and the containers you build will depend on the OOT stack. They won't run on systems with the in-tree driver. We are also going to see qatlib adopted by distributions which means even less maintenance for your workloads to worry about.