Sensu + K8s
A Kubernetes controller that watches cluster resources (Deployment
, Pod
etc.) and runs Sensu checks which are declared in the resource's annotations. For each of the check definitions it will run them at the specified interval, updating the checks with the latest resource info provided by Kubernetes. It communicates directly to Sensu's RabbitMQ endpoint, dynamically adding Sensu proxy clients and optionally removing them on resource deletion or annotation update.
Sens8 effectively acts a Sensu super client. It allows for checks to be run on ephemeral resources such as a pod under a replicaset without having to sidecar the heavy Sensu ruby client, or restart it. By having checks defined in the resources themselves it gives teams greater autonomy and lets checks be pushed out via CD. It also prevents having to manage resource-level checks out of band via configuration management.
Unlike a traditional setup where Sensu's client is installed with check scripts, only a predefined number of checks are supported (i.e., only check definitions are allowed in the annotations). Refer to the check command documentation for what is valid for each resource. Most checks are intended to be run over the network - like making an http request to a pod. For resources such as deployments, only checks based on the status provided by Kubernetes are possible.
A keepalive is run for Sens8 itself, and only itself, since all other checks register under proxy clients. Given its super client nature it's advised to set up alerting/paging appropriately.
make all
Refer to kubernetes.yml
for an example deployment + config
- Only RabbitMQ transport is supported (no Redis)
- If Sens8 crashes, all checks that have a ttl will alert
- Subscriptions are ignored
- Only checks are run. No metrics
- Certain resources types such
Deployment
have a limited feature set and only alert of the status provided by Kubernetes.
See projects for status
Checks are defined in the annotation hootsuite.com/sensu-checks
of a given resource as a json array check objects. The check format is same as Sensu's check configuration minus some fields being ignored. Refer to Sensu's check definition specification as a guide.
field | type | required | example | description |
---|---|---|---|---|
name | string | * | my_check |
Name of the check. Must be unique to the resource it's running on. See Sensu's check naming |
command | string | * | deployment_status --warn 0.9 --crit 0.8 |
Check to run. The first chunk of the string is the check id. All other parameters get parsed by the check itself. Environment variables and backticks (sub-shells) are interpolated. Refer to the check command documentation for valid checks that can be run on the given resource |
interval | int | * | 30 |
Interval in seconds between checks |
deregister | bool | true |
Deregister the proxy client sensu on resource deletion or annotation removal. Default value is dependant on resource type. Pod: true, Deployment: false |
All other (arbitrary) fields get passed along with the check result in the same way Sensu client proper does.
Check results will be registered/de-registered with the following template:
Resource | Source (client name) | Deregister default |
---|---|---|
Deployment | deployment-name.deployment.namespace |
false |
Pod | pod-name.pod.namespace |
true |
Service | sevice-name.service.namespace |
false |
ReplicaSet | replicatset-name.replicatset.namespace |
false |
Node | node-name.node.namespace |
false |
DaemonSet | daemonSet-name.daemonSet.namespace |
false |
Example check annotation
[
{
"name": "deployment_status",
"command": "deployment_status --warn 0.9 --crit 0.8",
"interval": 60,
"deregister": true,
"foo": "bar"
}
]
Example check result
For the given check example above, the check result to Sensu might look like:
[
{
"name": "deployment_status",
"command": "deployment_status --warn 0.9 --crit 0.8",
"interval": 60,
"deregister": true,
"foo": "bar",
"source": "test-app.deployment.default",
"status": 0,
"output": "replicas: 5, updatedReplicas: 5, unavailableReplicas: 0"
}
]
It is possible to set defaults for all checks in Sens8's -config-file
config. Arbitrary fields are respected, but any field defined in the check will override it. It is also possible to set a default for the required interval
. Example:
{
"defaults": {
"interval": 60,
"occurrences": 3,
"refresh": 900,
"handlers": ["default"],
"pager_team": "production_engineering"
}
}
kind: Deployment
apiVersion: extensions/v1beta1
metadata:
labels:
app: test-app
name: test-app
namespace: default
annotations:
hootsuite.com/sensu-checks: |-
[
{
"name": "deployment_status",
"command": "deployment_status --warn 0.9 --crit 0.8",
"interval": 60,
"occurrences": 3,
"refresh": 900,
"handlers": ["default"],
"pager_team": "production_engineering",
"documentation": "http://example.com/runbook.html",
"deregister": true
}
]
spec:
replicas: 1
revisionHistoryLimit: 1
selector:
matchLabels:
app: test-app
template:
metadata:
labels:
app: test-app
annotations:
hootsuite.com/sensu-checks: |-
[
{
"name": "check_pod_dependencies",
"command": "hs_healthcheck --url http://:::POD_IP::::80/status/dependencies",
"interval": 60,
}
]
spec:
containers:
- name: test-app
image: markeijsermans/debug
command: ["python3"]
args: ["-u", "-m", "http.server", "80"]
imagePullPolicy: Always
ports:
- containerPort: 80
protocol: TCP
Each resource is limited to a subset of commands that can be run against it.
or
Get latest docs via: ./sens8 -check-docs
If master branch contains a git tag, circle-ci will tag the docker image as well. Make sure to push the tag, before the commit. This will likely change when circle-ci 2.0 supports git tags.