francoismdj/netbox-kea-dhcp

Program crashes when dhcp4 service is offline

Opened this issue · 3 comments

During a server reboot, netbox-kea-dhcp crashes with the following trace:

août 22 09:32:08 charade netbox-kea-dhcp[846]: netbox: https://netbox-srv, kea: http://127.0.0.1:8000/
août 22 09:32:08 charade netbox-kea-dhcp[846]: Start full sync
août 22 09:32:08 charade netbox-kea-dhcp[846]: pull running config from DHCP server
août 22 09:32:08 charade netbox-kea-dhcp[846]: Traceback (most recent call last):
août 22 09:32:08 charade netbox-kea-dhcp[846]:   File "/usr/local/bin/netbox-kea-dhcp", line 8, in <module>
août 22 09:32:08 charade netbox-kea-dhcp[846]:     sys.exit(run())
août 22 09:32:08 charade netbox-kea-dhcp[846]:   File "/usr/local/lib/netbox-kea-dhcp/lib/python3.10/site-packages/netboxkea/entry_point.py", line 32, in run
août 22 09:32:08 charade netbox-kea-dhcp[846]:     conn.sync_all()
août 22 09:32:08 charade netbox-kea-dhcp[846]:   File "/usr/local/lib/netbox-kea-dhcp/lib/python3.10/site-packages/netboxkea/connector.py", line 83, in sync_all
août 22 09:32:08 charade netbox-kea-dhcp[846]:     self.kea.pull()
août 22 09:32:08 charade netbox-kea-dhcp[846]:   File "/usr/local/lib/netbox-kea-dhcp/lib/python3.10/site-packages/netboxkea/kea/app.py", line 64, in pull
août 22 09:32:08 charade netbox-kea-dhcp[846]:     self.conf = self.api.get_conf()
août 22 09:32:08 charade netbox-kea-dhcp[846]:   File "/usr/local/lib/netbox-kea-dhcp/lib/python3.10/site-packages/netboxkea/kea/api.py", line 68, in get_conf
août 22 09:32:08 charade netbox-kea-dhcp[846]:     return self._request_kea('config-get')['Dhcp4']
août 22 09:32:08 charade netbox-kea-dhcp[846]:   File "/usr/local/lib/netbox-kea-dhcp/lib/python3.10/site-packages/netboxkea/kea/api.py", line 60, in _request_kea
août 22 09:32:08 charade netbox-kea-dhcp[846]:     raise KeaCmdError(f'command "{command}" returns "{text}"')
août 22 09:32:08 charade netbox-kea-dhcp[846]: netboxkea.kea.exceptions.KeaCmdError: command "config-get" returns "unable to forward command to the dhcp4 service: No such file or directory. The server is likely to be offline"

Expected result: handle errors trowned by Kea API, retry later when dhcp4 service is alive

We could keep trying Kea API command until it successes or until it fails too much time. During that time, no more webhook events will be processed (netbox-kea-dhcp doesn’t manage any request queue by itself, only HTTP layer does).

At the end, if command still fails, we should still crashes because we don’t want to lost some events and not others, which would lead to an inconsistent sync state. Or a full sync should be scheduled (which may be done automatically if admin agreed to do so)

One potential possibility is to let the listener continue receiving events and stack them in a queue until the connection is restored up to a certain/configured stack size? Maybe build a dummy dict/json dump of the configs locally and then merge the local and dhcp4 service when it comes back online.

One potential possibility is to let the listener continue receiving events and stack them in a queue until the connection is restored up to a certain/configured stack size? Maybe build a dummy dict/json dump of the configs locally and then merge the local and dhcp4 service when it comes back online.

Hi,
I like the last idea about building a json for its simplicity, as the configuration pushed to Kea on netbox events is already the whole config in JSON format. But there’s a counter part: it will not be possible to check config right after each update. If the json file pushed to Kea (when it eventually come online) generate an error, all the cached events will be lost and a full sync will be the only solution.

Stacking would avoid this limitation, with more complexity.

If we want to keep code simple, we can just handle the error in order to exit gracefully and monitor the processus with a monitoring software. Starting kea-dhcp4 before netbox-kea-dhcp is then a requirement.