/perf-cpp

Lightweight recording and sampling of performance counters for specific code segments directly from your C++ application.

Primary LanguageC++GNU Lesser General Public License v3.0LGPL-3.0

perf-cpp: Access Performance Counters from C++ Applications

perf-cpp is a powerful C++ library that provides direct access to hardware performance counters from the application. The library allows for precise event-counting and sampling of specific code segments and to link sampled data (e.g., memory addresses) with application-specific details (e.g., class instances).

Key Features

  • Count Hardware Events: Integrate performance monitoring into your application. Configure, start, and stop hardware counters to profile specific code segments.
  • Sampling: Leverage sampling to record performance data periodically, e.g., instruction pointers, memory addresses, access latency, branches, and more.
  • Customizable Event Configuration: Use built-in hardware events (e.g., cycles, instructions, cache-misses) and those specific to your underlying CPU. Additionally, define and utilize Metrics–quantitative measurements like cycles per instruction–to gain deeper insights into performance and efficiency.
  • Practical Examples: Jumpstart your implementation with the diverse collection of examples that demonstrate practical applications of the library.

Quick Start

Get up and running with perf-cpp in seconds:

# Clone the repository
git clone https://github.com/jmuehlig/perf-cpp.git

# Switch to the repository folder
cd perf-cpp

# Optional: Switch to the latest stable version
git checkout v0.9.0

# Build the library (in build/)
cmake . -B build -DBUILD_EXAMPLES=1
cmake --build build

# Optional: Build examples (in build/examples/bin)
cmake --build build --target examples

For detailed building instructions, including how to integrate perf-cpp into your CMake projects, visit our build guide.

Usage Examples

Count Hardware Events

Quickly set up hardware event monitoring:

#include <perfcpp/event_counter.h>

/// Initialize the counter
auto counters = perf::CounterDefinition{};
auto event_counter = perf::EventCounter{ counters };

/// Specify hardware events to count
event_counter.add({"seconds", "instructions", "cycles", "cache-misses"});

/// Run the workload
event_counter.start();
your_workload(); /// <-- Your code to profile
event_counter.stop();

/// Print the result to the console
const auto result = event_counter.result();
for (const auto [event_name, value] : result)
{
    std::cout << event_name << ": " << value << std::endl;
}

Possible output:

seconds:      0.0955897 
instructions: 5.92087e+07
cycles:       4.70254e+08
cache-misses: 1.35633e+07

For further details, including how to count events in parallel settings, visit our guide on recording events.

Record Samples

Implement detailed sampling with control over the recorded content:

#include <perfcpp/sampler.h>

/// Create the sampler
auto counters = perf::CounterDefinition{};
auto sampler = perf::Sampler{ counters };

/// Specify when a sample is recorded: every 4000th cycle
sampler.trigger("cycles", perf::Period{4000U});

/// Specify what metadata is included into a sample: time, CPU ID, instruction
sampler.values()
    .time(true)
    .cpu_id(true)
    .instruction_pointer(true);

/// Run the workload
sampler.start();
your_workload(); /// <-- Your code to profile
sampler.stop();

/// Print the samples to the console
const auto samples = sampler.result();
for (const auto& sample_record : samples)
{
    const auto time = sample_record.time().value();
    const auto cpu_id = sample_record.cpu_id().value();
    const auto instruction = sample_record.instruction_pointer().value();
    
    std::cout 
        << "Time = " << time << " | CPU = " << cpu_id
        << " | Instruction = 0x" << std::hex << instruction << std::dec
        << std::endl;
}

Possible output:

Time = 365449130714033 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449130913157 | CPU = 8 | Instruction = 0x64af7417c75c
Time = 365449131112591 | CPU = 8 | Instruction = 0x5a6e84b2075c
Time = 365449131312005 | CPU = 8 | Instruction = 0x64af7417c75c 

For further details, for example, which metrics can be included into samples, visit our sampling guide.

Advanced Examples

We include a comprehensive collection of examples demonstrating the advanced capabilities of perf-cpp, including, for example, counting events in parallel settings and sampling memory accesses.

All code examples are available in the examples/ folder.

Further Reading

  • Full Documentation: Explore detailed guides on every feature of perf-cpp.
  • Examples: Learn how to set up different features from code-examples.
  • Changelog: Stay updated with the latest changes and improvements.

System Requirements

  • C++ Standard: Requires support for C++17 features.
  • CMake Version: 3.10 or higher.
  • Linux Kernel Version: 4.0 or newer (note that some features need a newer Kernel).
  • perf_event_paranoid Setting: Adjust as needed to allow access to performance counters (see the Paranoid Value Section below).

Adjusting perf_event_paranoid Value

The perf_event_paranoid setting controls access to performance counters:

  • -1: No restrictions (full access).
  • 0: Allow normal users access, but no raw tracepoint samples.
  • 1: Allow user and kernel-level profiling (default since Linux 4.6).
  • >= 2: Only user-level measurements allowed.

Checking the Current Value

cat /proc/sys/kernel/perf_event_paranoid

Changing the Value Temporarily

sudo sysctl -w kernel.perf_event_paranoid=-1

Note: To make this change permanent, edit /etc/sysctl.conf and add kernel.perf_event_paranoid = -1.

Contribute and Contact

We welcome contributions and feedback to make perf-cpp even better. For feature requests, feedback, or bug reports, please reach out via our issue tracker or submit a pull request.

Alternatively, you can email me: jan.muehlig@tu-dortmund.de.


Further Profiling Projects

While perf-cpp is dedicated to providing developers with clear insights into application performance, it is part of a broader ecosystem of tools that facilitate performance analysis. Below is a non-exhaustive list of some other valuable profiling projects:

  • PAPI offers access not only to CPU performance counters but also to a variety of other hardware components including GPUs, I/O systems, and more.
  • Likwid is a collection of several command line tools for benchmarking, including an extensive wiki.
  • PerfEvent provides lightweight access to performance counters, facilitating streamlined performance monitoring.
  • Intel's Instrumentation and Tracing Technology allows applications to manage the collection of trace data effectively when used in conjunction with Intel VTune Profiler.
  • For those who prefer a more hands-on approach, the perf_event_open system call can be utilized directly without any wrappers.

Resources about (Perf-) Profiling

This is a non-exhaustive list of academic research papers and blog articles (feel free to add to it, e.g., via pull request – also your own work).

Academical Papers

Blog Posts