kubernetes-sigs/node-feature-discovery-operator

Support additional custom source configurations

slintes opened this issue · 13 comments

What would you like to be added:
Recently configuration options for the NFD worker were extended: it supports a /etc/kubernetes/node-feature-discovery/custom.d directory now, which can be used for additional custom source configurations besides those in the main worker config. Typically this directory will be populated by mounting ConfigMaps. The needed volume and mount configurations should be handled by the operator.

Why is this needed:
The new configuration option allows dynamic configuration by potentially multiple parties. Each party can maintain their own ConfigMap without the need for cross party agreements. Ideally the operator can watch for those ConfigMaps and reconfigure the worker on the fly by adding and removing relevant volumes and mounts.

Design considerations
The big question is: how to find the relevant ConfigMaps. Some thoughts:

  • add the ConfigMap names to the NFD CRD. This makes implementation easy. But this isn't very dynamic and still needs manual work, which introduces some risk that in case of misconfiguration (name mismatch between config and actual ConfigMap, accidental deletion of ConfigMaps, ...?) the nfd workers won't start because a volume mounts fails.
  • let the operator find relevant ConfigMaps and (re)-configure the worker daemonset dynamically. However, we still need to know which ConfigMaps are relevant. The first restriction is easy: the CM needs to be in the same namespace as the NFD CR / worker. And then? A very basic check might be to have a look into the data and try to parse it as custom source configuration. Or at least look for e.g. the "matchOn" string. An alternative might be to require the ConfigMaps to have a certain name, e.g. a "custom-config-" prefix (maybe configurable in the NFD CR), or a label, or an annotation.
  • what worries me a bit: for this dynamic solution the operator would need to watch for all ConfigMaps in all namespaces. At least I did not find how to dynamically restrict the watch to the desired namespace(s). Is this a problem (thinking of huge clusters with many CMs...)?
  • side note: talking about namespaces: do I see correct that the operator watches ALL namespaces for NFD CRs and potentially installs multiple instances of NFD in multiple namespaces? Is that on purpose?

I think my favorite is using a marker label for ConfigMaps which should be mounted.
When we agree a way forward, I volunteer to implement it, in order to finish the work which started on the NFD worker :)

Related: #53

/assign

mythi commented

side note: talking about namespaces: do I see correct that the operator watches ALL namespaces for NFD CRs and potentially installs multiple instances of NFD in multiple namespaces? Is that on purpose?

your observation is correct, see #54

in case of misconfiguration (name mismatch between config and actual ConfigMap, accidental deletion of ConfigMaps, ...?) the nfd workers won't start because a volume mounts fails.

I learned there is a optional flag for configmap mounts, so this isn't an issue

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Let's not close this yet
/remove-lifecycle stale

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

/remove-lifecycle stale

Hmm, do we still want to keep this open? Especially after NFD v0.10.0 (and #653)? I'd say no.

If we would implement this it would probably mean a separate CRD for the extra custom configs which doesn't make much sense after kubernetes-sigs/node-feature-discovery#653, in practice overlapping functionality and more maintenance burden

Thoughts @slintes @ArangoGutierrez?

thanks for the heads up

in practice overlapping functionality and more maintenance burden

sounds like a good argument to me to close this

Let's close this after #119

Let's close this after #119

Agree

#119 is merged, we can say this issue has been properly addressed
/close

@ArangoGutierrez: Closing this issue.

In response to this:

#119 is merged, we can say this issue has been properly addressed
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.