AmadeusITGroup/cpubench1A

cpubench1a v3 unfair to non AMD64 CPUs

Closed this issue · 3 comments

v3 uses go 1.17 which comes with a new way of passing function arguments and results using registers instead of the stack. However, this optimization was only implemented for AMD64.

Consequence: the v3 results when comparing Intel/AMD and ARM CPUs are biased.

go 1.18 will extend the new ABI to other 64 bits architectures including Aarch64 (i.e. ARM) for all OS. This should fix the bias.

Experiment: run the benchmark (v3 source code) on a Graviton2 box with different versions of Go:

With go1.18beta1:

2022/01/25 18:16:57 Single thread
2022/01/25 18:16:57     Minimum: 119.693789
2022/01/25 18:16:57     Average: 120.162098
2022/01/25 18:16:57      Median: 120.204255
2022/01/25 18:16:57     Maximum: 120.513693
2022/01/25 18:16:57 
2022/01/25 18:16:57 Multi-thread
2022/01/25 18:16:57     Minimum: 3489.115843
2022/01/25 18:16:57     Average: 3493.844434
2022/01/25 18:16:57      Median: 3493.748363
2022/01/25 18:16:57     Maximum: 3501.293556

With go 1.17.6:

2022/01/25 19:27:37 Single thread
2022/01/25 19:27:37     Minimum: 102.406435
2022/01/25 19:27:37     Average: 102.516306
2022/01/25 19:27:37      Median: 102.517893
2022/01/25 19:27:37     Maximum: 102.605876
2022/01/25 19:27:37 
2022/01/25 19:27:37 Multi-thread
2022/01/25 19:27:37     Minimum: 2993.487060
2022/01/25 19:27:37     Average: 2997.341450
2022/01/25 19:27:37      Median: 2998.005405
2022/01/25 19:27:37     Maximum: 2999.150969

With go 1.16.13:

2022/01/26 10:58:48 Single thread
2022/01/26 10:58:48     Minimum: 101.121431
2022/01/26 10:58:48     Average: 101.281067
2022/01/26 10:58:48      Median: 101.279680
2022/01/26 10:58:48     Maximum: 101.417488
2022/01/26 10:58:48 
2022/01/26 10:58:48 Multi-thread
2022/01/26 10:58:48     Minimum: 2963.869610
2022/01/26 10:58:48     Average: 2967.229737
2022/01/26 10:58:48      Median: 2966.941988
2022/01/26 10:58:48     Maximum: 2971.097378

There is a +17% difference between 1.17 and 1.18 for ARM64, which is quite impressive.

Experiment: run the benchmark (v3 source code) on a Intel Ice Lake box with different versions of Go

With go 1.18beta1:

2022/01/26 13:31:41 Single thread
2022/01/26 13:31:41     Minimum: 187.765502
2022/01/26 13:31:41     Average: 188.087595
2022/01/26 13:31:41      Median: 188.082765
2022/01/26 13:31:41     Maximum: 188.414868
2022/01/26 13:31:41 
2022/01/26 13:31:41 Multi-thread
2022/01/26 13:31:41     Minimum: 3224.423573
2022/01/26 13:31:41     Average: 3228.537187
2022/01/26 13:31:41      Median: 3228.588813
2022/01/26 13:31:41     Maximum: 3234.624215

With go 1.17.6:

2022/01/26 12:31:38 Single thread
2022/01/26 12:31:38     Minimum: 183.593545
2022/01/26 12:31:38     Average: 184.533444
2022/01/26 12:31:38      Median: 184.630280
2022/01/26 12:31:38     Maximum: 184.752278
2022/01/26 12:31:38 
2022/01/26 12:31:38 Multi-thread
2022/01/26 12:31:38     Minimum: 3177.209433
2022/01/26 12:31:38     Average: 3183.845775
2022/01/26 12:31:38      Median: 3184.258358
2022/01/26 12:31:38     Maximum: 3187.235816

With go 1.16.13:

2022/01/26 12:07:04 Single thread
2022/01/26 12:07:04     Minimum: 165.826124
2022/01/26 12:07:04     Average: 166.258367
2022/01/26 12:07:04      Median: 166.295972
2022/01/26 12:07:04     Maximum: 166.473568
2022/01/26 12:07:04 
2022/01/26 12:07:04 Multi-thread
2022/01/26 12:07:04     Minimum: 2957.437883
2022/01/26 12:07:04     Average: 2959.839765
2022/01/26 12:07:04      Median: 2959.992776
2022/01/26 12:07:04     Maximum: 2962.311325

Fixed by release version 3.1 compiled with go 1.18.