Support exposing the status of individual resources applied on cluster
varshaprasad96 opened this issue · 3 comments
Describe the problem/challenge you have
Once the package contents fetched from the source are applied on the cluster through App CR, there is no way to know the health of the individual resources applied on the cluster. It would be helpful to know the status of individual resources applied, on the "status" section of App CR. This would be useful for SRE/ops teams managing hundreds or thousands of clusters - wherein this information can be scraped for monitoring.
Describe the solution you'd like
The controller that manages App, can also watch the individual resources being applied through informers. For core types, like deployments and pods where health information is already available, this can be used and stamped on App CR's status.
Anything else you would like to add:
A similar feature is available an OLM v1's component named Rukpak. Kubernetes-sigs/cli-utils (https://github.com/kubernetes-sigs/cli-utils/tree/master/pkg/kstatus) provides a set of helpers to enable collecting of status from core resource types. More details on the implementation can be found here: https://github.com/operator-framework/rukpak/blob/main/internal/healthchecks/builtin.go#L16-L33
Open questions:
- Should we trigger reconcile, when any of the resource is unhealthy?
- Would watching resources through informers increase cache, thereby affecting performance?
Vote on this request
This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.
👍 "I would like to see this addressed as soon as possible"
👎 "There are other more important things to focus on right now"
We are also happy to receive and review Pull Requests if you want to help working on this issue.
Another open question: how would we want to expose status of non-builtin APIs. For example, consider a MongoDB
CR that is managed by an App
. Do we need a way for package authors to define how to scrape health from custom objects?
I know RukPak does not support this. RukPak just treats unknown types as permanently healthy. Point being, I don't think we necessarily have to include custom types in the scope of this, but something we might want to keep in mind in the design.
@joelanford The other option (inspired from package-operator) as discussed in this thread was to allow users to pass in CEL expressions, which the controller will evaluate to decide if the resource is healthy or not.
Do we need a way for package authors to define how to scrape health from custom objects?
The go to way for us has been to include some config that is consumed by kapp
which allows us to specify "depending on the status of this resource, when is it ready" (we call them waitRules)
This would mean that kapp-controller would only mark the Package as reconciled if these conditions are met.
However, surfacing resource specific conditions is something we might need to think about. (which seems to be the goal here). This would require a new API (probably within the same config) specifying how certain resources can have their statuses converted to additional conditions on the PackageInstall
itself.