rib/gputop

Enable capturing OA metrics via MDAPI

rib opened this issue · 0 comments

rib commented

MDAPI is an api available on Windows and Linux for capturing OA metrics, used by tools such as GPA and VTune for capturing GPU metrics.

A couple of benefits I can see we'd get from optionally being able to read our metrics via MDAPI are:

  • A stepping stone towards making GPU Top run on Windows so it can be a more broadly useful tool for developers
  • If we can get to the point of being able to run on Windows then we'd be able to compare that our metrics are consistent on the same hardware for the same workloads which would provide us more input on whether our driver (and the Windows driver) are working well depending on their consistency.
  • In the short term it could help further test the Linux implementation of MDAPI we use for enabling GPA on Linux by checking for consistency between using MDAPI or the kernel directly.

This would involve updating gputop-perf.c to allow us to conditionally dlopen() libmd.so and use MDAPI to open and read a stream of metrics instead of using the i915 perf kernel interface directly.

My hope would be that we find ways of mapping metric sets discovered via mdapi to the metric sets we already know about so that once we find a mapping to a metric set guid we won't need to deal with evaluating mdapi normalization equations at runtime and can instead re-use the oa-xyz.c code we generate at build time. Initially we can just do this mapping based on the metric set symbol names, but it could be pertinent to also look at ways of cross-referencing that the B/C counters described by mdapi match our corresponding counter descriptions for the same set to be more confident that the raw counters are really based on the same hardware configuration.

Notably the above mapping limits how much of the mdapi interface we would depend on and effectively test, but the interest here is more in comparing the data got from the hardware and kernel.