rvesse/sparql-query-bm

Possible Incorrect Variance

Closed this issue · 5 comments

As I was playing around with this -- I noticed that there appears to be some discrepancies between variance and standard deviation. The values were (in this case for the Operation Mix Summary):

Mix Runtime Variance: 4490.498657509938s
Mix Runtime Standard Deviation: 0.0021190796722893495s

The code was ultra simple --

BenchmarkOptions ops = new BenchmarkOptions();
ProgressListener pl = new ConsoleProgressListener();
ops.setRuns(10);
ops.setWarmups(5);
ops.setQueryEndpoint(REMOVED);
ops.addListener(pl);

List<Operation> operationsList = new ArrayList<Operation>();

operationsList.add(new FixedQueryOperation("Query 1", "ASK WHERE { ?s a ?type }"));
operationsList.add(new FixedQueryOperation("Query 2", "ASK WHERE { ?s a ?type }"));

ops.setOperationMix(new OperationMixImpl(operationsList));

BenchmarkRunner runner = new BenchmarkRunner();
runner.run(ops); 

I took a brief look and am not really sure what is going on/wrong

I noticed that there was a bug in calculating variance for individual operations (it was returned in seconds when the javadoc stated it is returned in nanoseconds) but as far as variance for the overall mix the code looks correct and did not rely on variance calculated for individual operations.

The code defers to commons-math for statistics like variance and standard deviation so maybe we are hitting an overflow or something or that ilk?

I have now reproduced this based on your setup:

Mix Runtime Variance: 375.7961875s
Mix Runtime Standard Deviation: 6.130221753737788E-4s

My guess is that there is some sort of overflow bug we are encountering, I am going to test out a possible fix and see if that resolves the problem

The problem is not that the variance is incorrect but that the presentation of it is incorrect.

Variance is essentially the standard deviation squared so for a start should be reported as seconds squared rather than seconds.

The second problem is that internally everything is stored as nanoseconds for maximum precision and the presentation code converts to seconds. However as variance is squared it gains extra orders of magnitude which aren't accounted for in the conversion code.

So I think the fix for this issue is two-fold:

  1. Present variance as seconds square
  2. Convert variances from nanoseconds squared to seconds squared appropriately

Fixed, will be included in next release

Fix released as part of 2.0.1