MAX_WAIT_TIME for rescan should be a config option
mattjala opened this issue · 1 comments
The value is currently hardcoded. Letting it be set would allow the runners to wait longer and avoid failures like this.
This issue isn't due to scans taking a long time. The domain scans are actually getting stuck in an infinite wait due to inaccurate timestamps after #346
Sometimes, a scan would record a completion timestamp that was slightly BEFORE the recorded time that the rescan request was sent out. Because the check to stop waiting for the scan requires a scan finished timestamp later than the scan request time, it would never terminate and eventually return a 503. The inaccuracy occurs because the node that records the scan completion time is a different node than records the request time.
I'm not sure why getNow()
is more inconsistent between nodes than time.time()
- even when nodes start at different times, time.perf_counter() - app["start_time_relative"]
should be a precise measure of how long the node has been online, and app["start_time]"
should be an OS-precision UNIX timestamp. Adding them should produce unix timestamp for the for the current time which is no more inaccurate than time.time()
. It shouldn't be a problem with async operations, since perf_counter continues to count during sleep and is system wide.