SonarSource/sonar-dotnet

Rule S6608: Benchmark is benchmarking the wrong things

Closed this issue · 2 comments

I got a S6608 warning using SolarLink in VSCode and ran the benchmark referenced in the documentation.

The result is:

Method SampleSize LoopSize Mean Error StdDev
ElementAt 1000000 1000 11,614.7 ns 199.34 ns 204.71 ns
Index 1000000 1000 7,756.9 ns 54.88 ns 45.83 ns
First 1000000 1000 6,223.9 ns 23.47 ns 19.60 ns
First_Index 1000000 1000 483.3 ns 9.31 ns 14.50 ns
Last 1000000 1000 6,483.0 ns 128.93 ns 188.98 ns
Last_Index 1000000 1000 380.2 ns 7.19 ns 7.06 ns

The results suggest that ElementAt and Index are slow methods but the actual problem is that on every loop iteration one random value is being generated which is much slower than the actual array access:

var index = random.Next(0, SampleSize);

A simple benchmark tells me that random.Next(0, SampleSize) itself takes around 7000 ns so it has a huge impact on the actual benchmark results. Without random.Next(0, SampleSize) the method ElementAt becomes much less slow and Index is on par with First_Index and Last_Index.

Method SampleSize LoopSize Mean Error StdDev
ElementAt 1000000 1000 3,450.9 ns 68.35 ns 104.38 ns
Index 1000000 1000 530.0 ns 9.63 ns 8.54 ns
First 1000000 1000 6,739.5 ns 134.67 ns 209.66 ns
First_Index 1000000 1000 528.3 ns 3.57 ns 2.78 ns
Last 1000000 1000 6,803.1 ns 44.47 ns 39.42 ns
Last_Index 1000000 1000 459.6 ns 1.66 ns 1.30 ns

The modified code is shown below. The only disadvantage of not using a random index is that now the CPU cache is being used in an optimal way which makes the result look a bit better than for a real random access.

To avoid the risk of code being optimized out I added a return value to all benchmark methods (https://fransbouma.github.io/BenchmarkDotNet/RulesOfBenchmarking.htm#avoid-dead-code-elimination).

private List<byte> data;
private Random random;

[Params(1_000_000)]
public int SampleSize;

[Params(1_000)]
public int LoopSize;

[GlobalSetup]
public void Setup()
{
    random = new Random(42);
    var bytes = new byte[SampleSize];
    random.NextBytes(bytes);
    data = bytes.ToList();
}

[Benchmark]
public int ElementAt()
{
    int result = default;

    for (int i = 0; i < LoopSize; i++)
    {
        result = random.Next(0, SampleSize);
    }

    return result;
}

[Benchmark]
public int Index()
{
    int result = default;

    for (int i = 0; i < LoopSize; i++)
    {
        result = data[i];
    }

    return result;
}

[Benchmark]
public int RandomNext()
{
    int result = default;

    for (int i = 0; i < LoopSize; i++)
    {
        result = data.First();
    }

    return result;
}

Hi @Apollo3zehn,

Thanks for the feedback. Designing a good benchmark is not an easy task and it's easy to fall into all kinds of traps :)

I think the current approach, in the case of ElementAt and Index, was comparing apples with apples, in the sense that random numbers were generated in the same way in both cases. If just the relative difference is considered then they do reflect which method is faster.

I see your point that Random.Next() is much more expensive compared with the element access, giving a false impression that accessing the array elements it's much more expensive than it actually is.

Our first thought to improve this was to use a predefined list of randomly generated numbers (done in the setup), but that would also mean accessing arrays twice in each of the benchmarked methods. Leading again to numbers greater than they actually are.

Another approach we considered was to use a pseudo-number generator fast enough to not affect the numbers too much and still provide a degree of randomness. An example would be the one invented by George Marsaglia.

We ended up with the simple solution, the one you suggested that removes the calls to the random number generator.

On the PC I use, the numbers look like this, which I agree is more consistent when providing a general view of all the benchmarked methods.

BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19045.4412/22H2/2022Update)
11th Gen Intel Core i7-11850H 2.50GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=8.0.301
  [Host]   : .NET 8.0.6 (8.0.624.26715), X64 RyuJIT AVX2
  .NET 8.0 : .NET 8.0.6 (8.0.624.26715), X64 RyuJIT AVX2

|      Method |      Job |  Runtime | SampleSize | LoopSize |       Mean |     Error |    StdDev |
|------------ |--------- |--------- |----------- |--------- |-----------:|----------:|----------:|
|   ElementAt | .NET 8.0 | .NET 8.0 |    1000000 |     1000 | 3,403.4 ns |  28.52 ns |  26.67 ns |
|       Index | .NET 8.0 | .NET 8.0 |    1000000 |     1000 |   478.0 ns |   6.93 ns |   6.48 ns |
|       First | .NET 8.0 | .NET 8.0 |    1000000 |     1000 | 6,160.0 ns |  57.66 ns |  53.93 ns |
| First_Index | .NET 8.0 | .NET 8.0 |    1000000 |     1000 |   485.7 ns |   5.81 ns |   5.15 ns |
|        Last | .NET 8.0 | .NET 8.0 |    1000000 |     1000 | 6,034.3 ns |  20.34 ns |  16.98 ns |
|  Last_Index | .NET 8.0 | .NET 8.0 |    1000000 |     1000 |   408.3 ns |   2.54 ns |   2.38 ns |

I do have a question about the returning value. I'm not sure if it makes a difference, as we are also assigning the retrieved value in the current implementation. I did a test and the numbers look very similar:

|      Method | SampleSize | LoopSize |       Mean |    Error |   StdDev |
|------------ |----------- |--------- |-----------:|---------:|---------:|
|   ElementAt |    1000000 |     1000 | 3,755.0 ns | 48.69 ns | 40.66 ns |
|       Index |    1000000 |     1000 |   481.4 ns |  6.45 ns |  5.72 ns |
|       First |    1000000 |     1000 | 6,240.3 ns | 57.20 ns | 53.51 ns |
| First_Index |    1000000 |     1000 |   433.8 ns |  1.15 ns |  1.08 ns |
|        Last |    1000000 |     1000 | 6,472.4 ns | 19.50 ns | 18.24 ns |
|  Last_Index |    1000000 |     1000 |   359.9 ns |  0.82 ns |  0.68 ns |

The last example is the one without a return statement.

Do you have more context about this? Or is there something that I'm missing?

Our first thought to improve this was to use a predefined list of randomly generated numbers (done in the setup), but that would also mean accessing arrays twice in each of the benchmarked methods. Leading again to numbers greater than they actually are.

That was also my first idea and then I came to the same conclusion :-)

I do have a question about the returning value. I'm not sure if it makes a difference, as we are also assigning the retrieved value in the current implementation
Do you have more context about this? Or is there something that I'm missing?

In the current implementation you assign the variable to a discard variable (_) and the method has no return value. This could make the compiler optimizing out the actual LOC we want to benchmark. So good practice for benchmarks is to make use of the return value as stated also here: Avoid dead code elimination.

I agree that there is not difference in this specific benchmark with the current .NET 8 compiler but that might change without us noticing.