cncf/tag-runtime

Container Device Interface

RenaudWasTaken opened this issue · 6 comments

Hello!

I would like to bring up the topic of third party device support in container runtimes and orchestration engines.

In the past the way we had solved this problem is through the device plugin mechanism.
However, over the past 2 years this system has been used, we noticed that this wasn't exactly the right approach.
Some of the issues we've noticed:

  • Devices support is kubernetes specific and can't be reused / extended at the runtime level. 
    e.g: you can have GPU / FPGA support in kubernetes but there's no --fpga option in docker or --device=fpga
  • By being "on top" of the CRI, we don't have access to the container specification preventing us from doing any operations based on container information
  • We are seeing vendors use docker's default runtime / custom runtimes and the CRIO hooks as workarounds in addition to the device plugin.

After discussing this in sig-node and at Kubecon US (there's also some more context here), we circled on an approach to standardize device support at the runtime level (similar to the CNI)such that kubernetes could re-use this support through the CRI.

For now I've summarized this here: https://github.com/RenaudWasTaken/cdi Let me know if this is something that fits this groups purpose and if people are interested in helping push this!

One of the questions which was surfaced was if this work is K8s specific.
A huge component of the discussion is about designing (and implementing) a specification for Runtimes to support third party plugins.

There is however a smaller component that probably fits more under sig-node which would consist in transitioning the device-plugin system to this new CDI system.

cc @raravena80 @kad

Thanks!

k82cn commented

... kubernetes could re-use this support through the CRI.
Let me know if this is something that fits this groups purpose and if people are interested in helping push this!
... One of the questions which was surfaced was if this work is K8s specific. ...

Overall, this fits sig-runtime's scope; I think one of questions is "both cri & device-plugin are k8s features, why bring them to cncf?" :)

It's better to bring this to our meeting to share more background to the stakeholders for next step.

Overall, this fits sig-runtime's scope; I think one of questions is "both cri & device-plugin are k8s features, why bring them to cncf?" :)

Sorry if my message was confusing, I was trying to point some of the motivations that lead to this idea and these discussions as well as the different use cases that people have in mind.

I've added this topic to tomorrow's (or today depending on the timezone) meeting!

k82cn commented

Sorry if my message was confusing, I was trying to point some of the motivations that lead to this idea and these discussions as well as the different use cases that people have in mind.

That'll be great if you can highlight the discussion/cases; I'm not sure whether others have time to go through all discussion :)

Still a work in progress and will likely change after I present it but here are the slides I'll be presenting: https://docs.google.com/presentation/d/1UXgKYx5AA9ThYYLDswHsXFDL7TrNzVcac8zjlMIfEDs/edit#slide=id.g8392ef75dc_0_8

Closing in favor of: #24