rzezeski/libMicro

Remove batching

Closed this issue · 0 comments

I'm fairly sure batches were only used because of coarse
resolution timers on older platforms. A way to make sure the
benchmark doesn't finish faster than the timing code can measure:
literally producing results that claim the benchmark took no time
at all. With modern hardware there is no excuse for a platform
not to provide a nanosecond-resolution timing source, eliminating
the need for batching for the most part.

The problem with batching is it hides latency. Batches are
averages: the stats calculated against them are stats of an
average, not the raw results.

For example, given a batch of 2 with values 10 and 100 you have a
reported latency of 55. This hides the fact the the latency is
actually multimodal. This causes misleading results when
repeatedly running the mprotect benchmark with a batch size of 1
and then with 2 or more. With batch of 1 the reported usecs/call
will fluctuate wildly, but batch of 2 doesn't fluctuate because
the numbers are being softened by averages.

Some benchmarks like log and exp take such a short time to run
that they can cause issues when not batched. There is some amount
of variance in the timing calls themselves and when the benchmark
length starts approaching that variance there is the chance for
negative numbers. For example, if libMicro expects getnsecs() to
take 65ns to complete but takes 61ns then there is a 4ns variance
and that time is going to be attributed to the benchmark itself.
If that benchmark only takes 3ns to run then a -1ns run time will
be reported.

I could keep the batching just for those really fast calls, but
that assumes a normal distribution for latency. Assumptions like
that tend to bite you in the ass.

I could attempt to determine the variance in the timing code
itself and take that into account which would have the side
benefit of improving the accuracy of all benchmarks.

Or I could punt for now and think about it some more.