trustpilot/beat-exporter

Retries exceeded causes pod to wrongly restart

Opened this issue · 4 comments

When launching a new pod the beats-exporter container will shut down with the following error message

2022-10-12 09:30:48,767 - __main__ - ERROR - Error connecting Beat at port 5066:
HTTPConnectionPool(host='localhost', port=5066): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f149d86b0d0>: Failed to establish a new connection: [Errno 111] Connection refused'))

However, after a couple of restarts (usually no more than 2) the pod will be alive and ready. I assume this is the filebeat container not being ready fast enough for the exporter to exhaust it's retries

is there some k8s way to handle this? i can add a delay by overriding the container CMD but it looks like kind of a hack for me
maybe we could set a startup delay using arguments? or increase interval between retries?

@OranShuster this is probably way too late, but anyways:

If you are running this in k8s, you could add a startupProbe to the exporter container which checks if the filebeat http server has started / is answering requests

@paketb0te this was solved with a sleep 10 && ....
also my cluster at the time was too old to support startup probes

sleep 10, the poor man's startupProbe 😅
Are you currently still using the beat exporter? If so, is it still working?
(just wondering because the last commit is 4 years old)

@paketb0te when I was laid off in March it was still working. It was on filebeat 7 but 8 don't think the metric endpoint changed that much