filebeat_up metric issue
Opened this issue · 1 comments
Hi guys, first of all thanks for the great work and support that you've put to this project !
I just want to mention an issue that i witnessed by deploying the beat-exporter
as a side container in a pod next to filebeat
in a kubernetes environment :
If the filebeat pod which exports the metrics on the port 5066 is in a state different than CrashLoopBackOff
- filbeat_up returns 0 - which is the expected behavior everything works fine.If the filebeat pod enters in a condition of a CrashLoopBackOff
then beat-exporter
doesn't register anything related to the pod hence filebeat_up
is absent and all the metrics for this particular pod.
CrashLoopBackOff
status of the filebeat pod - beat -exporter logs:
{"level":"error","message":"Could not load beat type, with error: Get http://localhost:5066: dial tcp 127.0.0.1:5066: connect: connection refused, retrying in 1s","time":"2021-01-05T09:58:45Z"}
{"level":"error","message":"Could not load beat type, with error: Get http://localhost:5066: dial tcp 127.0.0.1:5066: connect: connection refused, retrying in 1s","time":"2021-01-05T09:58:46Z"}
and here is the case when the POD is not in a CrashLoopBackOff / Error but in a different failed state and the filebeat_up is evaluated correctly to 0 :
{"level":"error","message":"Failed getting /stats endpoint of target: Get http://localhost:5066/stats: dial tcp 127.0.0.1:5066: connect: connection refused","time":"2021-01-05T09:59:04Z"}
{"level":"error","message":"Could not fetch stats endpoint of target: http://localhost:5066","time":"2021-01-05T09:59:25Z"}
{"level":"error","message":"Failed getting /stats endpoint of target: Get http://localhost:5066/stats: dial tcp 127.0.0.1:5066: connect: connection refused","time":"2021-01-05T09:59:25Z"}
{"level":"error","message":"Could not fetch stats endpoint of target: http://localhost:5066","time":"2021-01-05T09:59:34Z"}
Issue here, is that in first case ☝️ your beat
never reached "ready" state, that is, beat-exporter doesn't know what type beat to expect. In second case, this looks like beat crashed after being healthy previously, That is beat-exporter managed to get type of beat, initialize itself against it and then returning 0 when beat is crashed and is not reachable.
I'm referring to this: https://github.com/trustpilot/beat-exporter/blob/master/main.go#L93 initialization loop, in one case beat-exporter is stuck in this loop, in another case it's past that loop and in main "proxy" loop.