Inconsistent alerts triggered by the Prometheus alert manager
varampati6 opened this issue · 0 comments
Host operating system: output of uname -a
blackbox_exporter version: output of blackbox_exporter --version
blackbox_exporter, version 0.20.0 (branch: HEAD, revision: 91372eb)
build user: root@d6d8976bddf4
build date: 20220316-17:42:45
go version: go1.17.8
platform: linux/amd64
What is the blackbox.yml module config.
modules:
https_2xx:
prober: http
timeout: 5s
http:
valid_http_versions: ["HTTP/1.0","HTTP/1.1", "HTTP/2.0"]
method: GET
preferred_ip_protocol: "ip4"
valid_status_codes: [200,403,404,502] # An empty list defaults to 2xx
fail_if_ssl: false
fail_if_not_ssl: true
tls_config:
insecure_skip_verify: true
What is the prometheus.yml scrape config.
global:
scrape_interval: 60s
evaluation_interval: 10m
scrape_configs:
- job_name: 'blackbox'
metrics_path: '/probe'
params:
module:
- https_2xx
static_configs:
What logging output did you get from adding &debug=true
to the probe URL?
What did you do that produced an error?
The configuration as mentioned above
What did you expect to see?
"The Prometheus query probe_success{job="blackbox"} == 0
should return the applications that are in a down state for a 10-minute interval, as specified in the configuration mentioned above. In my case, the Jira application mentioned above is down."
What did you see instead?
"Instead of getting an alert from one result, probe_success{job="blackbox"} == 0
is returning inconsistent outputs. For instance, Jira application may go down for a couple of minutes, and sometimes the Jenkins application is down even though the Jenkins application is up when running the probe_success query"