czerwonk/ping_exporter

ping_loss_percent

krakazyabra opened this issue · 2 comments

Hi! Thanks for exporter!
Can you be kind, please, explain how ping_loss_percent shows the value?
Why is see 1 on inactive host? Shouldn't it bee 100? Why I also see value 1 on working hosts?
I supposed there should be the percent value. like rate(ping_loss_percent[2m]) 100 means that in last 2 minutes 100% of packets were lost.

dmke commented

The value range follows Prometheus best practices, whereby percentages are expressed as values between 0 (= 0%) and 1 (= 100%).

Hence, seeing 1 for an inactive host is expected. As for why you are seeing the same value for working hosts, I cannot say. Either there is (was) a bug in the exporter or underlying ping library (unlikely, we're using this extensively), or the host running the ping_exporter binary can't reach the target host.

The alert you're trying to model would be the following:

# /path/to/prometheus/ping.rules
---
groups:
- name: Ping
  rules
  - alert: PingPacketLost
    expr: ping_loss_percent{job="ping"} = 1
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "{{ $labels.target }} ({{ $labels.ip }}) not reachable"

Arguably, this metric is actually badly named (it is not a percent at all). A better name (and more consistent with Prometheus best practices) would be to call it ping_loss_ratio instead.