/tsc-benchmark

A low overhead nanosecond TSC benchmark

Primary LanguageC++

TSC Benchmark

A low overhead nanosecond TSC benchmark which allows you to measure small sections of code to nanosecond accuracy.

Links

Example

benchmarking::TSCBenchmarking benchmark{};
benchmark.Initialize();
auto result = benchmark.Run(code_for_benchmarking, settings);
std::cout << "Benchmark result: " << result.time_ << " ns" << std::endl;

RDTSC And RDTSCP Instructions

The benchmark uses rdtsc instruction and simple arithmatic operations to implement a clock with 1 ns precision, and is much faster and stable in terms of latency in less than 10 ns.

Also, the rdtscp instruction can be used to check that the programm did not switch to another cpu between tsc calls, which can significantly distort the measurements. To check cpu migration during the benchmarks please pass to the TSCBenchmarking template parameter bool CheckCpuMigration the true value.

Hardware Support

The benchmark checks that your /proc/cpuinfo contains nonstop_tsc, constant_tsc. But in general, the TSC, on the all modern x86 systems, runs at constant rate and never stops across all P states.

Also, the benchmark checks whether your system supports Invariant TSC, which can significantly affect the accuracy of measurements.

TSC Reordering

The compiler may reorder the reading of the TSC during benchmark. To avoid this, benchmarking::TSCClock<Barrier BarrierType> class is used, which implements different approaches of barriers:

OneCpuId barrier (default barrier type):

cpuid
rdtsc
code
cpuid
rdtsc

LFence barrier:

cpuid
rdtsc
code
lfence
rdtsc
cpuid

MFence barrier:

cpuid
rdtsc
code
cpuid
rdtsc
mfence

Rdtscp barrier (intel approach):

cpuid
rdtsc
code
rdtscp
cpuid

Four cpuid barrier:

cpuid
rdtsc
cpuid
code
cpuid
rdtsc
cpuid

TSC Overhead

In the benchmarking::TSCBenchmarking::Initialize method, the benchmark prepare and configure the system, calibrates the TSC for accurate results.

In addition, it makes several tests to calculate the overhead from tsc calls, which then needs to be subtracted from the final measured time.

Run Benchmark

After initialization, you can run the benchmark using the benchmarking::TSCBenchmarking::Run method.

This method sets the cpu on which the benchmark will be performed, warm up the benchmark and your code, makes several runs of your code and returns the average time.

In addition, you can use a minimalistic method benchmarking::TSCBenchmarking::MeasureTime of the benchmark. Which does nothing except reading the tsc. This method can be used in the code hot path to take simple measurements first, and then to translate them in another process into a more readable format.