infrequent bug in ftio:ftltime()
Closed this issue · 1 comments
GoogleCodeExporter commented
What steps will reproduce the problem?
see following
What is the expected output? What do you see instead?
some flows from 01-May are printed with a 20-Jun date
What version of the product are you using? On what operating system?
flow-tools-0.68.4, RHEL 5 on x86_64
Please provide any additional information below.
ftltime() hasn't changed since the ftio.c library file was created in 0.40,
but produces an incorrect result for maybe 10 minutes every 49.7 days
(unless your routers are rebooted at least this frequently).
This was noticed when I recently changed the code I use to process each
month's worth of flows to record the earliest and latest flow timestamp
seen. The earliest timestamp seen in the May flows was just before midnight
on Apr-30, as expected. However the latest flow seen was from Jun-20.
It was unlikely flow records/files had gotten mixed up since that date was
in the future. Now I was using a slightly modified version of ftltime()
that produced a result like struct fttime except there was a third field
for microseconds which I had a one-off interest in, but found they were
always zero from our routers.
I soon narrowed the problem down to flows occuring around 9:23PM on May-01.
I didn't think there was a problem with my code as the changes were so
small, and this was confirmed by grepping the output of flow-print -f 2
for '0620' with results like
00e0 203.100.24.26 0000 203.100.25.255 11 89 89 18 1404
0620.14:25:45.188 0620.14:25:55.160 9.972 78 00 10
004d 203.100.30.221 0000 203.100.30.255 11 8a 8a 2 458
0620.14:25:51.380 0501.21:23:08.080 3.996 229 00 10
From printing out the 4 values supplied to ftltime() when anomalous
results were produced, it was clear this happened just after sysUpTime
rolled-over from 4billion to a small number. ftltime() was producing
a difference close to 2^32 rather than the much smaller number it should
be. 2^32 milliseconds / 1000 / 86400 is about 49.7 days, which fits since
May-01 to Jun-20 is 49-50 days.
I believe I've fixed this problem in my version of ftltime(), but added
code to print out lines like the following whenever my result was diff-
erent to the ftltime() library version and these lines happen only when
the library version is producing Jun-20 dates, i.e. my version behaves
the same in all other cases.
sys 19200, secs 1209641007, msecs 184, First 4294952156, Last 19168
First lib_ftltime 0620.14:25:40.140, my_ftltime 0501.21:22:52.844
Last lib_ftltime 0501.21:23:27.152, my_ftltime 0501.21:23:27.152
As it turns out, I also checked the flow duration calculated directly
from Last-First against that of the ftt structs for First and Last for
both my version and the library version and both were always correct.
You wouldn't normally expect this from buggy code but it probably makes
perfect sense if you analysed the way way unsigned ints behave.
Here's my corrected code, and while I wouldn't expect all the comments
to make it into you repository, I think there needs to be something
indicating there was an issue in the past and that people should only
make changes with some care; in particular testing against cases when
sysUpTime has rolled-over while t hasn't.
/*
* [DMT 13-Jun-2008] corrected version of based on ftltime()
* ftltime returns a struct fttime = secs - sys + t
* But fails to handle the case when sys has recently rolled-over 2^32
* so sys is a small uint32_t value while t is close to 2^32 milliseconds.
* The result is about 47 days wrong.
*
* The approach used to fix this is to treat (sys - t) as a signed number
* which we expect to be quite small so we don't worry about corner cases
* like -2^31.
*
* I was guessing there would be a fixed relationship between sys and t,
* so either sys or t would roll-over first. I haven't really looked but
* suspect First is always behind sys. However while Last is normally
* behind sys:
* sysUpTime 17660, unix_secs 1209641005, First 1740, Last 1740
* here's a counter-example from the same 5 minute flow file
* sysUpTime 13660, unix_secs 1209641001, First 4294965144, Last 14716
*
* NB the case of Last>sys did not produce an error with the original code
* because the arithmetic was done in 2 stages, both involving the value
* divided by 1,000, and hence not subject to roll-over. The problem only
* happens when only one of sys and Last have rolled-over. Perhaps a correction
* factor of 2^32/1000 could have been applied to both the seconds and
* millisecond values.
*
* NB an alternative approach might be to use larger signed types, but the
* binary '%' operator behaves as a mathematical mod operation for positive
* values, i.e. for unsigned types. However "if either operand is negative,
* the behaviour will be machine-dependent":
* C: A Reference Manual 4th ed, p202.
* The best approach might well be to use doubles as they'll easily handle
* 32 seconds bits + 10 milliseconds bits.
*
*/
struct fttime my_ftltime(uint32_t sys, uint32_t secs, uint32_t nsecs, uint32_t
t)
{
uint32_t diff, diff_s, diff_m;
struct fttime ftt;
int diff_was_negative = 0;
/* unix seconds/nanoseconds to seconds/milliseconds */
ftt.secs = secs;
ftt.msecs = nsecs / 1000000L;
/* sys and t are in milliseconds */
diff = sys - t;
/* we don't expect the unsigned difference to be anywhere near 2^31 */
/* but allow +/- 2^31, approx 2 billion anyway */
if (diff > 2147483648) {
diff_was_negative = 1;
/* negate to have an absolute value before deriving secs/msecs */
diff = -diff;
}
diff_s = diff / 1000;
diff_m = diff % 1000;
/* calculate ftt -= diff */
if (diff_was_negative) {
/* subtracting a negative value means add */
ftt.secs += diff_s;
ftt.msecs += diff_m;
if (ftt.msecs >= 1000) {
/* handle overflow */
ftt.msecs -= 1000;
ftt.secs++;
}
} else {
/* simply subtract */
ftt.secs -= diff_s;
if (ftt.msecs < diff_m) {
/* handle underflow */
ftt.msecs += 1000;
ftt.secs--;
}
ftt.msecs -= diff_m;
}
return ftt;
} /* ftltime */
Original issue reported on code.google.com by dmt3.14...@gmail.com
on 17 Jun 2008 at 4:38
GoogleCodeExporter commented
Hi!
I'd like to thank you for the analysis. Unfortunately I think this approach
will be
better: http://gitorious.org/projects/flow-
tools/repos/mainline/commits/de05981cd0038d0d802f0e6727b54d6d21688282
If you have any objections you can still raise them. I just don't want to
include all
those complexity here if it can be done relatively cheap, especially given that
we're
heading towards 64-bit platforms anyway (all my collectors are already
64-bit...)
Original comment by therap...@gmail.com
on 10 Feb 2009 at 2:04
- Changed state: Fixed