MAX_WAIT_TIME for rescan should be a config option

Question

MAX_WAIT_TIME for rescan should be a config option

mattjala opened this issue 9 months ago · 1 comments

The value is currently hardcoded. Letting it be set would allow the runners to wait longer and avoid failures like this.

Answer 1 · 2024-04-16T21:23:09.000Z

This issue isn't due to scans taking a long time. The domain scans are actually getting stuck in an infinite wait due to inaccurate timestamps after #346

Sometimes, a scan would record a completion timestamp that was slightly BEFORE the recorded time that the rescan request was sent out. Because the check to stop waiting for the scan requires a scan finished timestamp later than the scan request time, it would never terminate and eventually return a 503. The inaccuracy occurs because the node that records the scan completion time is a different node than records the request time.

I'm not sure why getNow() is more inconsistent between nodes than time.time() - even when nodes start at different times, time.perf_counter() - app["start_time_relative"] should be a precise measure of how long the node has been online, and app["start_time]" should be an OS-precision UNIX timestamp. Adding them should produce unix timestamp for the for the current time which is no more inaccurate than time.time(). It shouldn't be a problem with async operations, since perf_counter continues to count during sleep and is system wide.