Goniometer

Goniometer is a work-in-progress profiling tool for ROCm, specifically targetting Linux.

A goniometer is a tool sometimes used to measure angles of features in roc(k) samples.

Building

Dependencies

Goniometer is written in zig, and requires version 0.13 to build. Additionally, the HSA headers are required. The build file automatically searches for these in the standard include paths as well as /opt/rocm/include, which is where the headers are installed by the default ROCm distribution. Alternatively, they can be obtained from ROCR-Runtime by adding --search-prefix path/to/ROCR-Runtime/src/inc.

Compiling

The project can be compiled by running zig build in the root directory. The produced binaries are placed in zig-out/bin and zig-out/lib by default, though this path can be overridden by passing --prefix <path> to the build command.

Usage

Currently, goniometer can be used to gather RadeonGPUProfiler compatible traces for gfx1030 based GPUs. These traces only contain the neccesary elements to be able to view instruction timing, which shows for each instruction of a kernel the number of cycles that it took to execute it.

Goniometer currently exposes itself as an HSA tool. HSA tools can be loaded by the AMD HSA runtime (ROCR-Runtime) by setting the HSA_TOOLS_LIB environment variable to the path of libgoniometer.so when executing a ROCm HIP program. All all kernels are traced, and the corresponding trace is saved as dump-<n>.rgp, where <n> is an arbitrary number representing the GPU that the trace was gathered from.

Internals

On Linux, the ROCm runtime uses the "architected queuing language" (AQL) to schedule work on the GPU. This is disctinctly different from PM4, the traditional command stream accepted by AMD GPUs. There is no (known) way to configure performance counters and SQTT tracing via AQL, but fortunately way there is an escape hatch, a HSA extension packet which enables PM4 execution via an AQL queue. This is also used by rocprof itself to perform performance tracking, and it means that we can use the code used by Mesa, AMDPAL, and rocprof, to configure the GPU to gather the right information.

Resources on profiling AMD GPUs

Some relevant information for gathering tracing information can be found in the following resources:

  • The AMDPAL driver offers the most complete public implementation for profiling AMD GPUs. In particular, look in gpaSession.cpp and the calls it makes. This project also has header definitions for the .rgp file format.
  • xgl interacts with the GpaSession from an SQTT layer. sqtt_layer.cpp is interesting in particular, as well as sqtt_rgp_annotations.h, which contains some information about the SQTT event format.
  • Mesa has some tracing functionality. radv_sqtt.c and ac_rgp.c are useful references.
  • HSA headers show how to interact with the HSA runtime.
  • rocprof is ROCm's official profiling tool. Unfortunately it does not support gathering SQTT traces (or any GPU above gfx9 in fact).