Benchmark script to use for comparing cloud provider CPU performance.
The script uses the benchmark program starve-check by Doug Gale, which performs tight loop computation without waits, to stress the CPU, and counts the number of iteration per unit of time (per second in this project) that each thread managed within that duration.
This project adds statistics to the test runs, so that it gets easier to:
- Reason about the CPU performance of a given cloud provider.
- Contrast the different compute characteristics of different sizes of cloud servers.
- Discern the variability in CPU capacity the providers give their customers.
- Compare offerings, so you get the most CPU cycles per $ spent.
$ docker run --rm jojje/cloud-cpu-bench --help
usage: cloud-cpu-bench [OPTIONS]
Cloud CPU benchmark. Calculates how many millions of operations per second are
being performed on each CPU on a linux machine. The CPU metrics are averaged
so that one can gauge per-core scalability and compare those figures between
cloud providers, and between differently sized machines on the same provider,
to gain most bang for the buck. After sampling completes, variance statistics
are provided, to highlight how (un)reliable the provider's offering is. The
benchmark program used is the following: https://github.com/doug65536/starve-
check
optional arguments:
-h, --help show this help message and exit
-t N, --threads N Number of threads to use. Defaults the same number as
there are "cpus" reported for the system (default: 4)
-s N, --samples N Number of samples to grab during the test (one sample
every second). Must be at least 2 samples (default: 100)
--version show program's version number and exit
$ docker run --rm jojje/cloud-cpu-bench
73M 72M 74M 77M
79M 74M 80M 74M
...
==[ CPU PERFORMANCE (Mops/cpu) ]======================================
P99 63.25
P95 65.0
P90 66.75
Mean 72.55
StDev 3.614
Median 73.5
==[ CPU INFO ]========================================================
CPUs 4
Model Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
MHz 2494.138
Cache 4096 KB
BogoMips 4988.27
==[ TEST INFO ]=======================================================
Samples 100
Threads 4
Generated 2020-08-29 17:26:59 +0000
Kernal Linux 6e1200833bea 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 Linux
Version 0.1.3
Starve-check f14d7ba8152c8e09179fea3f47fd6b3a93f13cae
The stats are computed by averaging the number of ops all CPUs process each second. The unit is Mega ops / second and CPU. So for the example above, the mean total number of ops per second for all VCPUs engaged is the mean per vcpu score times the number of threads. I.e. ~72.55 * 4 = 290 Mop/s
For those seeking guaranteed performance (lowest expected), the percentile figure is more relevant. P99 means that 99% of all samples were able to perform just north of 63 Mop/s.
What this communicates is that this particular vm type may not be suitable for workloads that require more than a stable 63 Mop/s, or workloads that are sensitive to latency, since the variance is rather high.