ZMap crashes when the host clock goes backwards
maxmouchet opened this issue · 3 comments
Describe the bug
ZMap randomly crashes with an assertion failure:
3:22 16% (17m left); send: 7052636 35.0 Kp/s (34.9 Kp/s avg); recv: 733668 3.39 Kp/s (3.63 Kp/s avg); drops: 0 p/s (0 p/s avg); hitrate: 10.40%
3:23 17% (17m left); send: 7087436 34.8 Kp/s (34.9 Kp/s avg); recv: 736653 2.98 Kp/s (3.63 Kp/s avg); drops: 0 p/s (0 p/s avg); hitrate: 10.39%
zmap: /root/zmap/src/send.c:322: send_run: Assertion `t > last_time' failed.
CLI Arguments
zmap -r 35000 -M icmp_echoscan -w zmap_targets.csv -b /root/blacklist_v4 -o icmp.csv
Expected behavior
ZMap should not crash.
Environment:
- OS: Ubuntu 16.04.7
- Version: 7471d5413c5453c81
Additional context
There is probably an issue on our host causing the clock to go backwards (e.g. NTP).
However, I noticed that ZMap uses gettimeofday
which makes no guarantee on the monotonicity of the clock:
The time returned by gettimeofday() is affected by discontinuous jumps in the system time (e.g., if the system administrator manually changes the system time). If you need a monotonically increasing clock, see clock_gettime(2).
Here is a small program to reproduce the issue (on affected hosts):
#include <stdio.h>
#include <sys/time.h>
// From zmap/src/logger.c
double now(void) {
struct timeval now;
gettimeofday(&now, NULL);
return (double) now.tv_sec + (double) now.tv_usec / 1000000.;
}
int main() {
double last_time = now();
while (1) {
double t = now();
if (t < last_time) {
printf("Non-monotonic clock: now=%f last=%f diff=%f\n", t, last_time, t - last_time);
break;
}
last_time = t;
}
return 0;
}
gcc -O3 main.c && ./a.out
# After a few seconds:
# Non-monotonic clock: now=1671705231.484864 last=1671705232.725731 diff=-1.240867
This issue is bound to happen randomly when ZMap runs for a long time.
Would it makes sense for ZMap to use clock_gettime
instead of gettimeofday
?
I don't believe clock_gettime
is POSIX standard, but I'd be fine using it on Linux (and other OSes where it's implemented). I'd be happy to look at a PR that changes this, but it's likely not going to rise the top of the priority list for the main development team to tackle in the foreseeable future compared to some other pending issues.
Sure, and that's a good point! We'll try this internally and I'll open a PR if it solves the problem.
Edit: it seems it is part of POSIX (https://pubs.opengroup.org/onlinepubs/9699919799/functions/clock_getres.html).
Fixed by #708