Retries exceeded causes pod to wrongly restart
Opened this issue · 4 comments
When launching a new pod the beats-exporter container will shut down with the following error message
2022-10-12 09:30:48,767 - __main__ - ERROR - Error connecting Beat at port 5066:
HTTPConnectionPool(host='localhost', port=5066): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f149d86b0d0>: Failed to establish a new connection: [Errno 111] Connection refused'))
However, after a couple of restarts (usually no more than 2) the pod will be alive and ready. I assume this is the filebeat container not being ready fast enough for the exporter to exhaust it's retries
is there some k8s way to handle this? i can add a delay by overriding the container CMD but it looks like kind of a hack for me
maybe we could set a startup delay using arguments? or increase interval between retries?
@OranShuster this is probably way too late, but anyways:
If you are running this in k8s, you could add a startupProbe to the exporter container which checks if the filebeat http server has started / is answering requests
@paketb0te this was solved with a sleep 10 && ....
also my cluster at the time was too old to support startup probes
sleep 10
, the poor man's startupProbe 😅
Are you currently still using the beat exporter? If so, is it still working?
(just wondering because the last commit is 4 years old)
@paketb0te when I was laid off in March it was still working. It was on filebeat 7 but 8 don't think the metric endpoint changed that much