bgpkit/bgpkit-broker

updater thread may stuck after remote connection error

Closed this issue · 1 comments

2024-07-18T05:38:12.481896Z  INFO heartbeat sent
2024-07-18T05:38:12.481905Z  INFO finished updating broker database
2024-07-18T05:38:12.481925Z  INFO wait for 300 seconds before next update
2024-07-18T05:42:42.004023Z  INFO successfully connected to NATS server with root subject: public.broker.
2024-07-18T05:42:42.004050Z  INFO event: connected
2024-07-18T05:42:42.004257Z  INFO update broker db from the latest date - 1 in db: 2024-07-17
2024-07-18T05:42:42.004261Z  INFO start updating broker database for 63 collectors
2024-07-18T05:43:03.773417Z ERROR NetworkError: request or response body error: error reading a body from connection: Connection reset by peer (os error 54)
2024-07-18T05:43:09.727433Z ERROR NetworkError: error sending request for url (https://data.ris.ripe.net/rrc18/2024.07): error trying to connect: tls handshake eof

As shown in recent incident log, the crawler thread encountered some request error at around 5:43 and there was no activities/logs since then at the time of checking around 6:30. It looks like the thread that handles fetching MRT metadata got stuck or died while the main thread was not aware and continues serving the stale data. The issue was temporarily resolved by restarting the whole program but we will need to make sure either the main thread is aware of any sub-thread issues and error out or handles the errors gracefully. It is not acceptable to have the main thread and API thread continue running while the fetching thread inactive.

Resolved in 596cc8d.