____ _ _____
| _ \ / \|_ _|
| |_) / _ \ | |
| __/ ___ \| |
|_| /_/ \_\_|
Prometheus Alert Testing tool
You may also be interested in PromCLI
go get github.com/kevinjqiu/pat
You must have golang 1.9+ and dep
installed.
Check out this repo to $GOPATH/src/github.com/kevinjqiu/pat
and then:
cd $GOPATH/src/github.com/kevinjqiu/pat && make build
pat [options] <test_yaml_file_glob>
e.g.,
pat test/*.yaml
Test files are written in yaml format. For a complete schema definition (in jsonschema format), see here.
name
- The name of the test caserules
- The rule definitions that are under testfixtures
- The fixture setup for the testsassertions
- The test assertions
The rules
section defines how the rules-under-test should be loaded.
Currently, two rules loading strategies are supported:
- fromFile - load the rules from a .rules yaml file. If the path specified is not an absolute path, the rule file path will be relative to the test file.
- fromLiteral - embed the rules under test right inside the test file.
rules:
fromFile: http-rules.yaml
or
rules:
fromLiteral: |-
groups:
- name: prometheus.rules
rules:
- alert: HTTPRequestRateLow
expr: http_requests{group="canary", job="app-server"} < 100
for: 1m
labels:
severity: critical
The fixtures
section defines a list of metrics fixtures that the tests will be using.
Each item in the list has the following attributes:
duration
- How long these metrics will be set to the specified value. The duration must be acceptable by Golang'stime.ParseDuration()
, e.g.,5m
(5 minutes),1h
(1 hour), etc.metrics
- The metrics and their values
fixtures:
5m:
- http_requests{job="app-server", instance="0", group="blue"} 75
- http_requests{job="app-server", instance="1", group="blue"} 120
This will create these two metrics, with the values last for 5 minutes.
You are also able to specify multiple metrics values:
5m:
- http_requests{job="app-server", instance="0", group="blue"} 75 100 200
In this case, the metric http_requests{job="app-server", instance="0", group="blue"}
will be set to 75
for the first 5 minutes, 100
for the next 5 minutes and 200
for the next 5 minutes. You can use this form to easily setup long running time series.
The assertions
section contains a list of expectations when the alert rules are evaluated at certain time.
at
- The instant when the rules are being evaluatedexpected
- The list of expected alert properties
assertions:
- at: 0m
expected:
- alertname: HTTPRequestRateLow
alertstate: pending
job: app-server
severity: critical
- at: 5m
expected:
- alertname: HTTPRequestRateLow
alertstate: firing
job: app-server
severity: critical
- at: 10m
expected: []
In this example, we're asserting that when the alert rules are evaluated at 0m
, with the given fixtures, we should get HTTPRequestRateLow
alert in pending
state, and when evaluated at 5m
, the alert should be in firing
state. When evaluated at 10m
, we shouldn't get any alert.
Suppose you have the following rule file that you want to be tested:
groups:
- name: prometheus.rules
rules:
- alert: HTTPRequestRateLow
expr: http_requests{group="canary", job="app-server"} < 100
for: 1m
labels:
severity: critical
Write a yaml file with your test cases:
name: Test HTTP Requests too low alert
rules:
fromFile: rules.yaml
fixtures:
- duration: 5m
metrics:
- http_requests{job="app-server", instance="0", group="canary", severity="overwrite-me"} 75 85 95 105 105 95 85
- http_requests{job="app-server", instance="1", group="canary", severity="overwrite-me"} 80 90 100 110 120 130 140
assertions:
- at: 0m
expected:
- alertname: HTTPRequestRateLow
alertstate: pending
group: canary
instance: "0"
job: app-server
severity: critical
- alertname: HTTPRequestRateLow
alertstate: pending
group: canary
instance: "1"
job: app-server
severity: critical
comment: |-
At 0m, the alerts met the threshold but has not met the duration requirement. Expect the alert to be pending
- at: 5m
expected:
- alertname: HTTPRequestRateLow
alertstate: firing
group: canary
instance: "0"
job: app-server
severity: critical
- alertname: HTTPRequestRateLow
alertstate: firing
group: canary
instance: "1"
job: app-server
severity: critical
comment: |-
At 5m, the alerts should be firing because the duration requirement is met.
- at: 10m
expected:
- alertname: HTTPRequestRateLow
alertstate: firing
group: canary
instance: "0"
job: app-server
severity: critical
comment: |-
At 10m, the alert should be firing only for instance 0 because instance 1 is >= 100.
- at: 15m
expected: []
comment: |-
At 15m, both instances are back to normal, therefore we expect no alert.
Run the test:
$ ./pat examples/test.yaml
=== RUN Test_HTTP_Requests_too_low_alert_at_0m
--- PASS: Test_HTTP_Requests_too_low_alert_at_0m (0.00s)
=== RUN Test_HTTP_Requests_too_low_alert_at_5m
--- PASS: Test_HTTP_Requests_too_low_alert_at_5m (0.00s)
PASS