adsr/flow-tools

infrequent bug in ftio:ftltime()

Closed this issue · 1 comments

What steps will reproduce the problem?
see following

What is the expected output? What do you see instead?
some flows from 01-May are printed with a 20-Jun date

What version of the product are you using? On what operating system?
flow-tools-0.68.4, RHEL 5 on x86_64

Please provide any additional information below.
ftltime() hasn't changed since the ftio.c library file was created in 0.40,
but produces an incorrect result for maybe 10 minutes every 49.7 days
(unless your routers are rebooted at least this frequently).

This was noticed when I recently changed the code I use to process each
month's worth of flows to record the earliest and latest flow timestamp
seen. The earliest timestamp seen in the May flows was just before midnight
on Apr-30, as expected. However the latest flow seen was from Jun-20.

It was unlikely flow records/files had gotten mixed up since that date was
in the future. Now I was using a slightly modified version of ftltime()
that produced a result like struct fttime except there was a third field
for microseconds which I had a one-off interest in, but found they were
always zero from our routers.

I soon narrowed the problem down to flows occuring around 9:23PM on May-01.

I didn't think there was a problem with my code as the changes were so
small, and this was confirmed by grepping the output of flow-print -f 2
for '0620' with results like
  00e0 203.100.24.26    0000 203.100.25.255    11 89   89    18   1404
   0620.14:25:45.188  0620.14:25:55.160      9.972 78  00 10
  004d 203.100.30.221   0000 203.100.30.255    11 8a   8a    2    458
   0620.14:25:51.380  0501.21:23:08.080      3.996 229 00 10

From printing out the 4 values supplied to ftltime() when anomalous
results were produced, it was clear this happened just after sysUpTime
rolled-over from 4billion to a small number. ftltime() was producing
a difference close to 2^32 rather than the much smaller number it should
be. 2^32 milliseconds / 1000 / 86400 is about 49.7 days, which fits since
May-01 to Jun-20 is 49-50 days.

I believe I've fixed this problem in my version of ftltime(), but added
code to print out lines like the following whenever my result was diff-
erent to the ftltime() library version and these lines happen only when
the library version is producing Jun-20 dates, i.e. my version behaves
the same in all other cases.
 sys 19200, secs 1209641007,  msecs 184,  First 4294952156,  Last 19168     
   First  lib_ftltime  0620.14:25:40.140,  my_ftltime  0501.21:22:52.844  
    Last  lib_ftltime  0501.21:23:27.152,  my_ftltime  0501.21:23:27.152  

As it turns out, I also checked the flow duration calculated directly
from Last-First against that of the ftt structs for First and Last for
both my version and the library version and both were always correct.
You wouldn't normally expect this from buggy code but it probably makes
perfect sense if you analysed the way way unsigned ints behave.

Here's my corrected code, and while I wouldn't expect all the comments
to make it into you repository, I think there needs to be something
indicating there was an issue in the past and that people should only
make changes with some care; in particular testing against cases when
sysUpTime has rolled-over while t hasn't.


/*
 * [DMT 13-Jun-2008] corrected version of based on ftltime()
 * ftltime returns a struct fttime = secs - sys + t
 * But fails to handle the case when sys has recently rolled-over 2^32
 * so sys is a small uint32_t value while t is close to 2^32 milliseconds.
 * The result is about 47 days wrong.
 *
 * The approach used to fix this is to treat (sys - t) as a signed number
 * which we expect to be quite small so we don't worry about corner cases
 * like -2^31.
 * 
 * I was guessing there would be a fixed relationship between sys and t,
 * so either sys or t would roll-over first. I haven't really looked but
 * suspect First is always behind sys. However while Last is normally
 * behind sys:
 *   sysUpTime 17660, unix_secs 1209641005,  First       1740,  Last  1740      
 * here's a counter-example from the same 5 minute flow file
 *   sysUpTime 13660, unix_secs 1209641001,  First 4294965144,  Last 14716
 *
 * NB the case of Last>sys did not produce an error with the original code
 * because the arithmetic was done in 2 stages, both involving the value
 * divided by 1,000, and hence not subject to roll-over. The problem only
 * happens when only one of sys and Last have rolled-over. Perhaps a correction
 * factor of 2^32/1000 could have been applied to both the seconds and
 * millisecond values.
 *
 * NB an alternative approach might be to use larger signed types, but the
 * binary '%' operator behaves as a mathematical mod operation for positive
 * values, i.e. for unsigned types. However "if either operand is negative,
 * the behaviour will be machine-dependent":
 *   C: A Reference Manual 4th ed, p202.
 * The best approach might well be to use doubles as they'll easily handle
 * 32 seconds bits + 10 milliseconds bits.
 * 
 */
struct fttime my_ftltime(uint32_t sys, uint32_t secs, uint32_t nsecs, uint32_t 
t)
{

  uint32_t diff, diff_s, diff_m;
  struct fttime ftt;
  int diff_was_negative = 0;

  /* unix seconds/nanoseconds to seconds/milliseconds */
  ftt.secs  = secs;
  ftt.msecs = nsecs / 1000000L;

  /* sys and t are in milliseconds */
  diff = sys - t;

  /* we don't expect the unsigned difference to be anywhere near 2^31  */
  /* but allow +/- 2^31, approx 2 billion anyway                       */
  if (diff > 2147483648) {
    diff_was_negative = 1;
    /* negate to have an absolute value before deriving secs/msecs     */
    diff = -diff;
  }

  diff_s = diff / 1000;
  diff_m = diff % 1000;

  /* calculate ftt -= diff */
  if (diff_was_negative) {
    /* subtracting a negative value means add */
    ftt.secs  += diff_s;
    ftt.msecs += diff_m;
    if (ftt.msecs >= 1000) {
      /* handle overflow */
      ftt.msecs -= 1000;
      ftt.secs++;
    }
  } else {
    /* simply subtract */
    ftt.secs -= diff_s;
    if (ftt.msecs < diff_m) {
      /* handle underflow */
      ftt.msecs += 1000;
      ftt.secs--;
    }
    ftt.msecs -= diff_m;
  }

  return ftt;

} /* ftltime */


Original issue reported on code.google.com by dmt3.14...@gmail.com on 17 Jun 2008 at 4:38

Hi!

I'd like to thank you for the analysis. Unfortunately I think this approach 
will be 
better: http://gitorious.org/projects/flow-
tools/repos/mainline/commits/de05981cd0038d0d802f0e6727b54d6d21688282

If you have any objections you can still raise them. I just don't want to 
include all 
those complexity here if it can be done relatively cheap, especially given that 
we're 
heading towards 64-bit platforms anyway (all my collectors are already 
64-bit...)

Original comment by therap...@gmail.com on 10 Feb 2009 at 2:04

  • Changed state: Fixed