/kuberhealthy

A Kubernetes operator for running synthetic checks as pods. Works great with Prometheus!

Primary LanguageGoApache License 2.0Apache-2.0


Kuberhealthy is a Kubernetes operator for synthetic monitoring and continuous process verification. Write your own tests in any language and Kuberhealthy will run them for you. Automatically creates metrics for Prometheus. Includes simple JSON status page. Now part of the CNCF!

License Go Report Card CII Best Practices Twitter Follow
Join Slack

What is Kuberhealthy?

Kuberhealthy lets you continuously verify that your applications and Kubernetes clusters are working as expected. By creating a custom resource (a KuberhealthyCheck) in your cluster, you can easily enable various synthetic tests and get Prometheus metrics for them.

Kuberhealthy comes with lots of useful checks already available to ensure the core functionality of Kubernetes, but checks can be used to test anything you like. We encourage you to write your own check container in any language to test your own applications. It really is quick and easy!

Kuberhealthy serves the status of all checks on a simple JSON status page, a Prometheus metrics endpoint (at /metrics), and supports InfluxDB metric forwarding for integration into your choice of alerting solution.

Installation

Deployment

Kuberhealthy requires Kubernetes 1.16 or above.

Using Plain Ole' YAML

If you just want the rendered default specs without Helm, you can use the static flat file or the static flat file for Prometheus or even the static flat file for Prometheus Operator.

Here are the one-line installation commands for those same specs:

# If you don't use Prometheus:
kubectl create namespace kuberhealthy
kubectl apply -f https://raw.githubusercontent.com/kuberhealthy/kuberhealthy/master/deploy/kuberhealthy.yaml

# If you use Prometheus, but not with Prometheus Operator:
kubectl create namespace kuberhealthy
kubectl apply -f https://raw.githubusercontent.com/kuberhealthy/kuberhealthy/master/deploy/kuberhealthy-prometheus.yaml

# If you use Prometheus Operator:
kubectl create namespace kuberhealthy
kubectl apply -f https://raw.githubusercontent.com/kuberhealthy/kuberhealthy/master/deploy/kuberhealthy-prometheus-operator.yaml

Using Helm

kubectl create namespace kuberhealthy
helm repo add kuberhealthy https://kuberhealthy.github.io/kuberhealthy/helm-repos
helm install -n kuberhealthy kuberhealthy kuberhealthy/kuberhealthy

If you have Prometheus

helm install --set prometheus.enabled=true -n kuberhealthy kuberhealthy kuberhealthy/kuberhealthy

If you have Prometheus via Prometheus Operator:

helm install --set prometheus.enabled=true --set prometheus.serviceMonitor.enabled=true -n kuberhealthy kuberhealthy kuberhealthy/kuberhealthy

Configure Service

After installation, Kuberhealthy will only be available from within the cluster (Type: ClusterIP) at the service URL kuberhealthy.kuberhealthy. To expose Kuberhealthy to clients outside of the cluster, you must edit the service kuberhealthy and set Type: LoadBalancer or otherwise expose the service yourself.

Edit Configuration Settings

You can edit the Kuberhealthy configmap as well and it will be automatically reloaded by Kuberhealthy. All configmap options are set to their defaults to make configuration easy.

kubectl edit -n kuberhealthy configmap kuberhealthy

See Configured Checks

You can see checks that are configured with kubectl -n kuberhealthy get khcheck. Check status can be accessed by the JSON status page endpoint, or via kubectl -n kuberhealthy get khstate.

Further Configuration

To configure Kuberhealthy after installation, see the configuration documentation.

Details on using the helm chart are documented here. The Helm installation of Kuberhealthy is automatically updated to use the latest Kuberhealthy release.

More installation options, including static yaml files are available in the /deploy directory. These flat spec files contain the most recent changes to Kuberhealthy, or the master branch. Use this if you would like to test master branch updates.

Visualized

Here is an illustration of how Kuberhealthy provisions and operates checker pods. The following process is illustrated:

  • An admin creates a KuberhealthyCheck resource that calls for a synthetic Kubernetes daemonset to be deployed and tested every 15 minutes. This will ensure that all nodes in the Kubernetes cluster can provision containers properly.
  • Kuberhealthy observes this new KuberhealthyCheck resource.
  • Kuberhealthy schedules a checker pod to manage the lifecycle of this check.
  • The checker pod creates a daemonset using the Kubernetes API.
  • The checker pod observes the daemonset and waits for all daemonset pods to become Ready
  • The checker pod deletes the daemonset using the Kubernetes API.
  • The checker pod observes the daemonset being fully cleaned up and removed.
  • The checker pod reports a successful test result back to Kuberhealthy's API.
  • Kuberhealthy stores this check's state and makes it available to various metrics systems.

Included Checks

You can use any of the pre-made checks by simply enabling them. By default Kuberhealthy comes with several checks to test Kubernetes deployments, daemonsets, and DNS.

Some checks you can easily enable:

  • SSL Handshake Check - checks SSL certificate validity and warns when certs are about to expire.
  • CronJob Scheduling Failures - checks for events indicating that a CronJob has failed to create Job pods.
  • Image Pull Check - checks that an image can be pulled from an image repository.
  • Deployment Check - verifies that a fresh deployment can run, deploy multiple pods, pass traffic, do a rolling update (without dropping connections), and clean up successfully.
  • Daemonset Check - verifies that a daemonset can be created, fully provisioned, and torn down. This checks the full kubelet functionality of every node in your Kubernetes cluster.
  • Storage Provisioner Check - verifies that a pod with persistent storage can be configured on every node in your cluster.

Create Synthetic Checks for Your APIs

You can easily create synthetic tests to check your applications and APIs with real world use cases. This is a great way to be confident that your application functions as expected in the real world at all times.

Here is a full check example written in go. Just implement doCheckStuff and you're off!

package main

import (
  "github.com/kuberhealthy/kuberhealthy/v2/pkg/checks/external/checkclient"
)

func main() {
  ok := doCheckStuff()
  if !ok {
    checkclient.ReportFailure([]string{"Test has failed!"})
    return
  }
  checkclient.ReportSuccess()
}

You can read more about how checks are configured and learn how to create your own check container. Checks can be written in any language and helpful clients for checks not written in Go can be found in the clients directory.

Status Page

You can directly access the current test statuses by accessing the kuberhealthy.kuberhealthy HTTP service on port 80. The status page displays server status in the format shown below. The boolean OK field can be used to indicate global up/down status, while the Errors array will contain a list of all check error descriptions. Granular, per-check information, including how long the check took to run (Run Duration), the last time a check was run, and the Kuberhealthy pod ran that specific check is available under the CheckDetails object.

{
    "OK": true,
    "Errors": [],
    "CheckDetails": {
        "kuberhealthy/daemonset": {
            "OK": true,
            "Errors": [],
            "RunDuration": "22.512278967s",
            "Namespace": "kuberhealthy",
            "LastRun": "2019-11-14T23:24:16.7718171Z",
            "AuthoritativePod": "kuberhealthy-67bf8c4686-mbl2j",
            "uuid": "9abd3ec0-b82f-44f0-b8a7-fa6709f759cd"
        },
        "kuberhealthy/deployment": {
            "OK": true,
            "Errors": [],
            "RunDuration": "29.142295647s",
            "Namespace": "kuberhealthy",
            "LastRun": "2019-11-14T23:26:40.7444659Z",
            "AuthoritativePod": "kuberhealthy-67bf8c4686-mbl2j",
            "uuid": "5f0d2765-60c9-47e8-b2c9-8bc6e61727b2"
        },
        "kuberhealthy/dns-status-internal": {
            "OK": true,
            "Errors": [],
            "RunDuration": "2.43940936s",
            "Namespace": "kuberhealthy",
            "LastRun": "2019-11-14T23:34:04.8927434Z",
            "AuthoritativePod": "kuberhealthy-67bf8c4686-mbl2j",
            "uuid": "c85f95cb-87e2-4ff5-b513-e02b3d25973a"
        },
        "kuberhealthy/pod-restarts": {
            "OK": true,
            "Errors": [],
            "RunDuration": "2.979083775s",
            "Namespace": "kuberhealthy",
            "LastRun": "2019-11-14T23:34:06.1938491Z",
            "AuthoritativePod": "kuberhealthy-67bf8c4686-mbl2j",
            "uuid": "a718b969-421c-47a8-a379-106d234ad9d8"
        }
    },
    "CurrentMaster": "kuberhealthy-7cf79bdc86-m78qr"
}

Contributing

If you're interested in contributing to this project:

  • Check out the Contributing Guide.
  • If you use Kuberhealthy in a production environment, add yourself to the list of Kuberhealthy adopters!
  • Check out open issues. If you're new to the project, look for the good first issue tag.
  • We're always looking for check contributions (either in suggestions or in PRs) as well as feedback from folks implementing Kuberhealthy locally or in a test environment.

Monthly Community Meeting

If you would like to talk directly to the core maintainers to discuss ideas, code reviews, or other complex issues, we have a monthly Zoom meeting on the first Wednesday of the month. Click here to add the meeting to your calendar.