Support additional custom source configurations

Question

Support additional custom source configurations

slintes opened this issue 3 years ago · 13 comments

What would you like to be added:
Recently configuration options for the NFD worker were extended: it supports a /etc/kubernetes/node-feature-discovery/custom.d directory now, which can be used for additional custom source configurations besides those in the main worker config. Typically this directory will be populated by mounting ConfigMaps. The needed volume and mount configurations should be handled by the operator.

Why is this needed:
The new configuration option allows dynamic configuration by potentially multiple parties. Each party can maintain their own ConfigMap without the need for cross party agreements. Ideally the operator can watch for those ConfigMaps and reconfigure the worker on the fly by adding and removing relevant volumes and mounts.

Design considerations
The big question is: how to find the relevant ConfigMaps. Some thoughts:

add the ConfigMap names to the NFD CRD. This makes implementation easy. But this isn't very dynamic and still needs manual work, which introduces some risk that in case of misconfiguration (name mismatch between config and actual ConfigMap, accidental deletion of ConfigMaps, ...?) the nfd workers won't start because a volume mounts fails.
let the operator find relevant ConfigMaps and (re)-configure the worker daemonset dynamically. However, we still need to know which ConfigMaps are relevant. The first restriction is easy: the CM needs to be in the same namespace as the NFD CR / worker. And then? A very basic check might be to have a look into the data and try to parse it as custom source configuration. Or at least look for e.g. the "matchOn" string. An alternative might be to require the ConfigMaps to have a certain name, e.g. a "custom-config-" prefix (maybe configurable in the NFD CR), or a label, or an annotation.
what worries me a bit: for this dynamic solution the operator would need to watch for all ConfigMaps in all namespaces. At least I did not find how to dynamically restrict the watch to the desired namespace(s). Is this a problem (thinking of huge clusters with many CMs...)?
side note: talking about namespaces: do I see correct that the operator watches ALL namespaces for NFD CRs and potentially installs multiple instances of NFD in multiple namespaces? Is that on purpose?

I think my favorite is using a marker label for ConfigMaps which should be mounted.
When we agree a way forward, I volunteer to implement it, in order to finish the work which started on the NFD worker :)

Related: #53

ArangoGutierrez commented 3 years ago

/assign

Answer 1 · 2021-05-04T06:22:45.000Z

side note: talking about namespaces: do I see correct that the operator watches ALL namespaces for NFD CRs and potentially installs multiple instances of NFD in multiple namespaces? Is that on purpose?

your observation is correct, see #54

Answer 2 · 2021-05-19T21:16:57.000Z

in case of misconfiguration (name mismatch between config and actual ConfigMap, accidental deletion of ConfigMaps, ...?) the nfd workers won't start because a volume mounts fails.

I learned there is a optional flag for configmap mounts, so this isn't an issue

Answer 3 · 2021-08-17T21:42:08.000Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Answer 4 · 2021-08-18T04:45:29.000Z

Let's not close this yet
/remove-lifecycle stale

Answer 5 · 2021-11-16T05:19:33.000Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Answer 6 · 2021-11-16T05:52:21.000Z

/remove-lifecycle stale

Answer 7 · 2022-01-20T10:31:43.000Z

Hmm, do we still want to keep this open? Especially after NFD v0.10.0 (and #653)? I'd say no.

If we would implement this it would probably mean a separate CRD for the extra custom configs which doesn't make much sense after kubernetes-sigs/node-feature-discovery#653, in practice overlapping functionality and more maintenance burden

Thoughts @slintes @ArangoGutierrez?

Answer 8 · 2022-01-20T11:45:04.000Z

thanks for the heads up

in practice overlapping functionality and more maintenance burden

sounds like a good argument to me to close this

Answer 9 · 2022-02-09T17:36:49.000Z

Let's close this after #119

Answer 10 · 2022-02-09T17:39:29.000Z

Let's close this after #119

Agree

Answer 11 · 2022-02-17T18:15:30.000Z

#119 is merged, we can say this issue has been properly addressed
/close

Answer 12 · 2022-02-17T18:15:43.000Z

@ArangoGutierrez: Closing this issue.

In response to this:

#119 is merged, we can say this issue has been properly addressed
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.