Support additional custom source configurations
slintes opened this issue · 13 comments
What would you like to be added:
Recently configuration options for the NFD worker were extended: it supports a /etc/kubernetes/node-feature-discovery/custom.d
directory now, which can be used for additional custom source configurations besides those in the main worker config. Typically this directory will be populated by mounting ConfigMaps. The needed volume and mount configurations should be handled by the operator.
Why is this needed:
The new configuration option allows dynamic configuration by potentially multiple parties. Each party can maintain their own ConfigMap without the need for cross party agreements. Ideally the operator can watch for those ConfigMaps and reconfigure the worker on the fly by adding and removing relevant volumes and mounts.
Design considerations
The big question is: how to find the relevant ConfigMaps. Some thoughts:
- add the ConfigMap names to the NFD CRD. This makes implementation easy. But this isn't very dynamic and still needs manual work, which introduces some risk that in case of misconfiguration (name mismatch between config and actual ConfigMap, accidental deletion of ConfigMaps, ...?) the nfd workers won't start because a volume mounts fails.
- let the operator find relevant ConfigMaps and (re)-configure the worker daemonset dynamically. However, we still need to know which ConfigMaps are relevant. The first restriction is easy: the CM needs to be in the same namespace as the NFD CR / worker. And then? A very basic check might be to have a look into the data and try to parse it as custom source configuration. Or at least look for e.g. the "matchOn" string. An alternative might be to require the ConfigMaps to have a certain name, e.g. a "custom-config-" prefix (maybe configurable in the NFD CR), or a label, or an annotation.
- what worries me a bit: for this dynamic solution the operator would need to watch for all ConfigMaps in all namespaces. At least I did not find how to dynamically restrict the watch to the desired namespace(s). Is this a problem (thinking of huge clusters with many CMs...)?
- side note: talking about namespaces: do I see correct that the operator watches ALL namespaces for NFD CRs and potentially installs multiple instances of NFD in multiple namespaces? Is that on purpose?
I think my favorite is using a marker label for ConfigMaps which should be mounted.
When we agree a way forward, I volunteer to implement it, in order to finish the work which started on the NFD worker :)
Related: #53
/assign
side note: talking about namespaces: do I see correct that the operator watches ALL namespaces for NFD CRs and potentially installs multiple instances of NFD in multiple namespaces? Is that on purpose?
your observation is correct, see #54
in case of misconfiguration (name mismatch between config and actual ConfigMap, accidental deletion of ConfigMaps, ...?) the nfd workers won't start because a volume mounts fails.
I learned there is a optional
flag for configmap mounts, so this isn't an issue
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
Let's not close this yet
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue or PR as fresh with
/remove-lifecycle stale
- Mark this issue or PR as rotten with
/lifecycle rotten
- Close this issue or PR with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Hmm, do we still want to keep this open? Especially after NFD v0.10.0 (and #653)? I'd say no.
If we would implement this it would probably mean a separate CRD for the extra custom configs which doesn't make much sense after kubernetes-sigs/node-feature-discovery#653, in practice overlapping functionality and more maintenance burden
Thoughts @slintes @ArangoGutierrez?
thanks for the heads up
in practice overlapping functionality and more maintenance burden
sounds like a good argument to me to close this
Let's close this after #119
#119 is merged, we can say this issue has been properly addressed
/close
@ArangoGutierrez: Closing this issue.
In response to this:
#119 is merged, we can say this issue has been properly addressed
/close
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.