Yves is a suite of profilers, c++ benchmarks, smelt test lists and a python library for performance testing.
To get started, you'll need to install poetry, the python package manager:
pipx install poetry
After that completes, install yves with
poetry install
if you are on linux, you'll also need to set perf event paranoid to 0 to access perf counters
sudo sysctl kernel.perf_event_paranoid=0
To run all of yves' benchmarks, launch smelt via:
poetry run smelt execute benches/all.smelt.yaml
If you are running on macos, you'll need to execute the benchmarks with sudo, in order to access performance counters:
sudo poetry run smelt execute benches/all.smelt.yaml
To view the output of the benchmarks, execute
poetry run yves/analysis/all.py
This will launch a textual
application that can be exited via control-c
NOTE: to change the shown graph, press tab and then use the arrow keys
Yves is a collection of C++ benchmarks, smelt test lists and scripts for analysis.
Smelt includes a simple mac and linux profiler. These are simple, non-sampling profilers, that record performance counters events for the duration of a program, similar to perf stat
.
I wrote these profilers for two reasons:
- I wanted to do performance analysis on mac
- I wanted this project to be as self contained as possible
Each profiler is compiled to a shared object and leverage the LD_PRELOAD "trick" to start counting events when each benchmark starts and ends. When each program exits, the performance counts are dumped to a json file in the current working directory named "counters.json"
The counters dumped currently are:
- cycles
- instructions
- branches
- branch-misses
Support for other counters could be brought up with a little elbow grease
All of the benchmarks are written in C++20, and leverage constexpr as much as possible. These benchmarks try to avoid using any system calls, so that the benchmarks can be re-used for the bring up of new designs. The exception to this is the use of "malloc" for large regions memory. Any benchmark that explicitly measures syscalls should be labeled as such
Additionally, every benchmark is a standalone, self-contained file that does not require any headers.
Each benchmark is written as a "template" with preprocessor defines that can be overridden. For example, for the random pointer chase benchmarks, the working set size of the memory foot print that is being pointer chased through is set via the RSS preprocessor variable.
By default, all benchmarks and profilers are compiled via zig 0.13.0, using the cc / c++ compiler functionality. I will bring up support for other compilers via godbolt if anyone asks.
The smelt test lists provide a mechanism to compile and run and profile benchmarks on your current machine. The "top level" test list is located at benches/all.smelt.yaml -- from there you can inspect the other test lists
The smelt analysis scripts, that can be run after a test is executed, will parse the output of each benchmark and plot the data in the terminal.
There are a futures I'd like to add, in no particular order:
- Running benchmarks on an open CPU or SOC (Ibex or SwerV)
- Adding directed instruction generation to measure latency and throughput of instructions
- given a machine readable isa spec and a lot of labour, we could use this infra to re-create the Anger Fog tables for riscv/arm/x86
- Add many more benchmarks
- all of the data prefetch patterns described at https://people.ucsc.edu/~hlitz/papers/asplos2020.pdf
- icache capacity tests
- memcpy, memset tests
- simple GEMM/SGemm tests