/canary-checker

Kubernetes Native Health Check Platform

Primary LanguageGoApache License 2.0Apache-2.0

Kubernetes Native Health Check Platform


Canary checker is a kubernetes-native platform for monitoring health across application and infrastructure using both passive and active (synthetic) mechanisms.

Features

  • Batteries Included - 35+ built-in check types
  • Kubernetes Native - Health checks (or canaries) are CRD's that reflect health via the status field, making them compatible with GitOps, Flux Health Checks, Argo, Helm, etc..
  • Secret Management - Leverage K8S secrets and configmaps for authentication and connection details
  • Prometheus - Prometheus compatible metrics are exposed at /metrics. A Grafana Dashboard is also available.
  • Dependency Free - Runs an embedded postgres instance by default, can also be configured to use an external database.
  • JUnit Export (CI/CD) - Export health check results to JUnit format for integration into CI/CD pipelines
  • JUnit Import (k6/newman/puppeter/etc) - Use any container that creates JUnit test results
  • Scriptable - Go templates, Javascript and CEL can be used to:
    • Evaluate whether a check is passing and severity to use when failing
    • Extract a user friendly error message
    • Transform and filter check responses into individual check results
    • Extract custom metrics
  • Multi-Modal - While designed as a Kubernetes Operator, canary checker can also run as a CLI and a server without K8s

Getting Started

  1. Install canary checker with Helm
helm repo add flanksource https://flanksource.github.io/charts
helm repo update

helm install \
  canary-checker \
  flanksource/canary-checker \
 -n canary-checker \
 --create-namespace
 --wait
  1. Create a new check
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
name: http-check
spec:
interval: 30
http:
  - name: basic-check
    url: https://httpbin.demo.aws.flanksource.com/status/200
  - name: failing-check
    url: https://httpbin.demo.aws.flanksource.com/status/500

2a. Run the check locally (Optional)

wget  https://github.com/flanksource/canary-checker/releases/latest/download/canary-checker_linux_amd64 \
-O canary-checker &&  chmod +x canary-checker
./canary-checker run canary.yaml

asciicast

  1. Apply the check
kubectl apply -f canary.yaml
  1. Check the health status
kubectl get canary
NAME               INTERVAL   STATUS   LAST CHECK   UPTIME 1H        LATENCY 1H   LAST TRANSITIONED
http-check.        30         Passed   13s          18/18 (100.0%)   480ms        13s

See fixtures for more examples and docs for more comprehensive documentation.

Use Cases

Synthetic Testing

Run simple HTTP/DNS/ICMP probes or more advanced full test suites using JMeter, K6, Playright, Postman.

# Run a container that executes a playwright test, and then collect the
# JUnit formatted test results from the /tmp folder
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
  name: playwright-junit
spec:
  interval: 120
  junit:
    - testResults: "/tmp/"
      name: playwright-junit
      spec:
        containers:
          - name: playwright
            image: ghcr.io/flanksource/canary-playwright:latest

Infrastructure Testing

Verify that infrastructure is fully operational by deploying new pods, spinning up new EC2 instances and pushing/pulling from docker and helm repositories.

# Schedule a new pod with an ingress and then time how long it takes to schedule, be ready, respond to an http request and finally be cleaned up.
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
  name: pod-check
spec:
  interval: 30
  pod:
    - name: golang
      spec: |
        apiVersion: v1
        kind: Pod
        metadata:
          name: hello-world-golang
          namespace: default
          labels:
            app: hello-world-golang
        spec:
          containers:
            - name: hello
              image: quay.io/toni0/hello-webserver-golang:latest
      port: 8080
      path: /foo/bar
      scheduleTimeout: 20000
      readyTimeout: 10000
      httpTimeout: 7000
      deleteTimeout: 12000
      ingressTimeout: 10000
      deadline: 60000
      httpRetryInterval: 200
      expectedContent: bar
      expectedHttpStatuses: [200, 201, 202]

Backup Checks / Batch File Monitoring

Check that batch file processes are functioning correctly by checking the age and size of files in local file systems, SFTP, SMB, S3 and GCS.

# Checks that a recent DB backup has been uploaded
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
  name: folder-check
spec:
  schedule: 0 22 * * *
  folder:
    - path: s3://database-backups/prod
      name: prod-backup
      maxAge: 1d
      minSize: 10gb

Alert Aggregation

Aggregate alerts and recommendations from Prometheus, AWS Cloudwatch, Dynatrace, etc.

apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
  name: alertmanager-check
spec:
  schedule: "*/5 * * * *"
  alertmanager:
    - url: alertmanager.monitoring.svc
      alerts:
        - .*
      ignore:
        - KubeScheduler.*
        - Watchdog
      transform:
        # for each alert, transform it into a new check
        javascript: |
          var out = _.map(results, function(r) {
            return {
              name: r.name,
              labels: r.labels,
              icon: 'alert',
              message: r.message,
              description: r.message,
            }
          })
          JSON.stringify(out);

Prometheus Exporter Replacement

Export custom metrics from the result of any check, making it possible to replace various other promethus exporters that collect metrics via HTTP, SQL, etc..

apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
  name: exchange-rates
spec:
  schedule: "every 1 @hour"
  http:
    - name: exchange-rates
      url: https://api.frankfurter.app/latest?from=USD&to=GBP,EUR,ILS
      metrics:
        - name: exchange_rate
          type: gauge
          value: result.json.rates.GBP
          labels:
            - name: "from"
              value: "USD"
            - name: to
              value: GBP

Platform Ready

Canary checker is ideal for building platforms, developers can include health checks for their applications in whatever tooling they prefer, with secret management that uses native Kubernetes constructs.

apiVersion: v1
kind: Secret
metadata:
  name:  basic-auth
stringData:
   user: john
   pass: doe
---
apiVersion: canaries.flanksource.com/v1
kind: Canary
metadata:
  name: http-basic-auth-configmap
spec:
  http:
    - url: https://httpbin.demo.aws.flanksource.com/basic-auth/john/doe
      username:
        valueFrom:
          secretKeyRef:
            name: basic-auth
            key: user
      password:
        valueFrom:
          secretKeyRef:
            name: basic-auth
            key: pass

Dashboard

Canary checker comes with a built-in dashboard by default

There is also a grafana dashboard, or build your own using the metrics exposed.

Getting Help

If you have any questions about canary checker:

Your feedback is always welcome!

Check Types

Protocol Status Checks
HTTP(s) GA Response body, headers and duration
DNS GA Response and duration
Ping/ICMP GA Duration and packet loss
TCP GA Port is open and connectable
Data Sources
SQL (MySQL, Postgres, SQL Server) GA Ability to login, results, duration, health exposed via stored procedures
LDAP GA Ability to login, response time
ElasticSearch / Opensearch GA Ability to login, response time, size of search results
Mongo Beta Ability to login, results, duration,
Redis GA Ability to login, results, duration,
Prometheus GA Ability to login, results, duration,
Alerts Prometheus
Prometheus Alert Manager GA Pending and firing alerts
AWS Cloudwatch Alarms GA Pending and firing alarms
Dynatrace Problems Beta Problems deteced
DevOps
Git GA Query Git and Github repositories via SQL
Azure Devops Beta
Integration Testing
JMeter Beta Runs and checks the result of a JMeter test
JUnit / BYO Beta Run a pod that saves Junit test results
K6 Beta Runs K6 tests that export JUnit via a container
Newman Beta Runs Newman / Postman tests that export JUnit via a container
Playwright Beta Runs Playwright tests that export JUnit via a container
File Systems / Batch
Local Disk / NFS GA Check folders for files that are: too few/many, too old/new, too small/large
S3 GA Check contents of AWS S3 Buckets
GCS GA Check contents of Google Cloud Storage Buckets
SFTP GA Check contents of folders over SFTP
SMB / CIFS GA Check contents of folders over SMB/CIFS
Config
AWS Config GA Query AWS config using SQL
AWS Config Rule GA AWS Config Rules that are firing, Custom AWS Config queries
Config DB GA Custom config queries for Mission Control Config D
Kubernetes Resources GA Kubernetes resources that are missing or are in a non-ready state
Backups
GCP Databases GA Backup freshness
Restic Beta Backup freshness and integrity
Infrastructure
EC2 GA Ability to launch new EC2 instances
Kubernetes Ingress GA Ability to schedule and then route traffic via an ingress to a pod
Docker/Containerd Deprecated Ability to push and pull containers via docker/containerd
Helm Deprecated Ability to push and pull helm charts
S3 Protocol GA Ability to read/write/list objects on an S3 compatible object store

Contributing

See CONTRIBUTING.md

Thank you to all our contributors !

License

Canary Checker core (the code in this repository) is licensed under Apache 2.0 and accepts contributions via GitHub pull requests after signing a CLA.

The UI (Dashboard) is free to use with canary checker under a license exception of Flanksource UI