replicatedhq/troubleshoot

Ability to display informational values from a JSON file or jq query

Closed this issue · 4 comments

Describe the rationale for the suggested feature.

As an operator (or an app packager), I'd like to be able to provide some high level insights to users, like number of nodes in the cluster, total pods, etc.

  • There are 22 nodes in the cluster
  • Total memory is 155G
  • Total CPU is 45 cores
  • There are 110 healthy pods, 3 in Pending, 2 in ImagePullBackoff and 2 in CrashLoopBackoff
  • The LoadBalancer Service is exposed at 35.122.12.54

Right now analyzer titles are sometimes determined by CheckName and sometimes auto-generated by the check itself (e.g. Cluster Pod Statuses.

Describe the feature

  • making Title fully controllable and templatable so titles like 2 nodes in cluster can be used via {{ .Data.Nodes | len }} nodes in the cluster, etc.
  • allow passing raw json/yaml content, or some subset of values from a json file
  • We'll probably want to do some fancy math in some of the expressions, like counting/filtering (e.g. 10 healthy nodes, 2 Not Ready / Unreachable)

I'd look for some feature like jsonValue as a complement to jsonCompare. Maybe it only supports pass or info status because it's always just informational and doesn't have a good/bad connotation to it?

analyzers:
  - jsonValue:
      checkName: Cluster Nodes
      fileName: cluster-resources/nodes.json
      outcomes:
        - pass: 
             message: '{{ .Data.Items | len}} nodes in cluster'

Love the idea, the IP address for load balancer service would be redacted though :)

Maybe this would be a new analyzer, a "cluster summary" type thing, with no need for config other than maybe switching off some lines if folks want that. That way we don't need to fuss with templates and fileName in the analyzer config, just have it as a new line:

analyzers:
  - clusterSummary: {}

@xavpaice a clustersummary analyzer would but interesting, what sort of stuff would you see that pulling out? NodePort and LoadBalancer services in the NS? Ingress endpoints? What else?

I think the times I've wanted to see cluster summary it's been things to give me some context on what I'm looking at - k8s version, number of workers, number of master nodes, is it kURL or not, and if kURL what is the install spec, that sort of thing. Number of 'interesting' pods is really handy. Probably a different thing to what the customer might want to see if they want some basic info like memory and CPU totals, but I figure maybe that's something they should be getting from Grafana anyway.