/windowsperf

WindowsPerf is (Linux perf inspired) Windows on Arm performance profiling tool

Primary LanguageC++BSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

WindowsPerf

WindowsPerf is (Linux perf inspired) Windows on Arm performance profiling tool. Profiling is based on ARM64 PMU and its hardware counters. WindowsPerf supports the counting model for obtaining aggregate counts of occurrences of special events, and sampling model for determining the frequencies of event occurrences produced by program locations at the function, basic block, and/or instruction levels.

WindowsPerf can instrument Arm CPU performance counters. As of now, it can collect:

  • Core PMU counters for all or specified CPU core.
  • unCore PMU counters:
    • ARM DynamIQ Shared Unit (DSU) PMU and
    • DMC-520 Dynamic Memory Controller are supported.
  • Arm Statistical Profiling Extension (SPE).

Currently we support:

  • counting model: WindowsPerf can utilize the Performance Monitoring Unit (PMU) counters from the CPU, DSU, and DMC to capture detailed counting profiles of workloads. By leveraging these counters, WindowsPerf can monitor various performance metrics and events, providing insights into the behavior and efficiency of the system. This comprehensive profiling helps in identifying bottlenecks, optimizing performance, and ensuring that workloads are running efficiently across different components of the system. You can find examples here.
  • sampling model: WindowsPerf can sample CPU Performance Monitoring Unit (PMU) events using two methods: software sampling and hardware sampling. In software sampling, the process is triggered by a PMU counter overflow interrupt request (IRQ), allowing the system to collect data at specific intervals. On the other hand, hardware sampling with the Arm Statistical Profiling Extension (SPE) provides precise sampling directly in hardware. This method captures detailed performance data without the overhead associated with software-based sampling, resulting in more accurate and reliable measurements. You can find examples here.

Arm Telemetry Solution Integration

The integration of WindowsPerf and Arm Telemetry Solution is a significant advancement in performance analysis on Windows On Arm. This integration is primarily based on PMU (Performance Monitoring Unit) events, which provide a detailed insight into the system’s performance. One of the standout features of the WindowsPerf Tool is the implementation of the Arm Topdown Methodology for μarch (microarchitecture) performance analysis. This methodology is tailored for each Arm CPU μarch. It involves the use of PMU events, metrics, and groups of metrics to provide a comprehensive analysis of the system’s performance. Furthermore, the WindowsPerf Tool is capable of platform μarchitecture detection, including Neoverse-N1, V1, and N2 CPUs.

The Arm Telemetry Solution also includes a topdown-tool that leverages the WindowsPerf as a backend for Windows On Arm. This tool applies the top-down methodology to break down CPU performance into different hierarchical levels, providing a detailed and systematic approach to performance analysis.

The topdown-tool uses the WindowsPerf to access the PMU events and metrics on Windows On Arm, enabling it to gather and analyze performance data directly from the hardware. This integration allows the topdown-tool to provide a comprehensive view of the system’s performance, from high-level metrics to low-level, detailed μarch events.

WindowsPerf Installation

You can find the latest WindowsPerf installation instructions in INSTALL.md.

WindowsPerf Releases

You can find all binary releases of WindowsPerf (wperf-driver and wperf application) here.

Building WindowsPerf

You can find the latest WindowsPerf build instructions in BUILD.md.

Contributing

When contributing to this repository, please first read CONTRIBUTING.md file for more details regarding how to contribute to this project.

WindowsPerf Modules

WindowsPerf solution contains few projects:

Other directories contain:

  • wperf-common contains common code between wperf and wperf-driver project. Mostly data structures describing IOCTRL binary protocol.
    • Note: wperf application communicates with wperf-driver via IOCTRL buffer. Proprietary binary protocol is used to exchange data, commands and status between two.
  • wperf-scripts contains various scripts including testing scripts.

Project resources

For more information regarding the project visit WindowsPerf Wiki.

References

Blogs And Announcements

Linaro Connect (Madrid) 2024:

Arm Learning Path

Arm CPU Telemetry Solution Documentation

Arm Neoverse PMU Guides

Arm Telemetry Specifications

Other