Compute average of 256 * 1024 * 1024 (2^28) floats (1 GB of data)
Sequential methods:
- Simple cycle
- 128 bit SIMD cycle
- Maximum available SIMD cycle
Parallel methods: sequential methods with #threads = #cores parallelism
Run benchmark.sh
from the repo root
Result on i7-6820HK (4 cores, 8 threads, Vector<float>.Count == 8
)
Method | Mean | Error | StdDev | Median | Scaled |
-------------------------- |----------:|----------:|----------:|----------:|-------:|
SimpleAverage | 299.23 ms | 3.5739 ms | 3.3431 ms | 298.19 ms | 1.00 |
ParallelAverage | 41.61 ms | 0.9273 ms | 0.7743 ms | 41.47 ms | 0.14 |
Vector4Average | 133.14 ms | 1.5449 ms | 1.4451 ms | 133.51 ms | 0.44 |
ParallelVector4Average | 37.54 ms | 0.2755 ms | 0.2442 ms | 37.51 ms | 0.13 |
VectorizedAverage | 56.89 ms | 1.2904 ms | 3.2136 ms | 55.19 ms | 0.19 |
ParallelVectorizedAverage | 36.03 ms | 0.3677 ms | 0.3440 ms | 35.94 ms | 0.12 |