trustpilot/beat-exporter

Explanation of metrics

Opened this issue · 9 comments

Love the exporter, but is there somewhere with a good description of what each of the metrics tracks?

There is no official documentation (as far as I know) from Elastic team, but most of the important metrics are these:

# HELP filebeat_libbeat_output_events libbeat.output.events
# TYPE filebeat_libbeat_output_events untyped
filebeat_libbeat_output_events{type="acked"} 0
filebeat_libbeat_output_events{type="active"} 0
filebeat_libbeat_output_events{type="batches"} 0
filebeat_libbeat_output_events{type="dropped"} 0
filebeat_libbeat_output_events{type="duplicates"} 0
filebeat_libbeat_output_events{type="failed"} 0

If you feel like delving and figuring out everything, PR is welcome to update help messages of metrics

@shivas can you please advise on what's the difference between filebeat_libbeat_output_events and filebeat_libbeat_pipeline_events?

Also, which metric would indicate connection issues to logstash/elasticsearch?

Thanks!

Also there's filebeat_filebeat_events, to add some complication :)

+1 here. I'm trying to understand how many messages we are collecting and sending but i'm not sure what's up from down.

+1 agree would be great to have some documentation something like this https://github.com/ClusterLabs/ha_cluster_exporter/blob/master/doc/metrics.md
This can be good for inspiration.

plef commented

+1. A bit of semantics can be read out of the official Kibana Filebeat monitoring built-in, but that is just a screenshot with very limited explanatory potential.

+1

This metric appears to be related to the events waiting to be sent: filebeat_libbeat_pipeline_events{type="active"}. I'm going to use this as an initial effort to monitor the filebeat queue (formatted for helm):

---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    {{- include "alert-rules.labels" . | nindent 4 }}
    {{- include "common-library.labels" . | nindent 4 }}
  name: alert-rules-filebeat
spec:
  groups:
    - name: alert-rules-filebeat
      rules:
        - alert: FileBeatQueueEmpty
          expr: |
            filebeat_libbeat_pipeline_events{type="active"} == 0
          for: 30m
          labels:
            severity: warning
          annotations:
            description: Filebeat queue is empty
        - alert: FileBeatQueueGrowing
          expr: |
            filebeat_libbeat_pipeline_events{type="active"} > 500 and
            delta(filebeat_libbeat_pipeline_events{type="active"}[15m]) > 0
          for: 15m
          labels:
            severity: warning
          annotations:
            description: |
              {{ `Filebeat queue is {{printf "%.0f" $value}} and growing` }}

image