Running several counters concurrently
zudov opened this issue · 3 comments
perf
can only monitor a specific OS process running on specific (or all) CPU core. It's unaware of Haskell's RTS and OS threads.
I expect that running several counters concurrently may give strange confusing results. Running a test with counter at the same time with other (non cpu-instruction-counter) tests will also be confusing.
Now, some test runners (e.g. tasty) do parallel test execution by default. This may be a great source of confusion for an unaware user.
I have several ideas of ranging complexity that can help here, but ultimately we have to play around and investigate this.
- Add a visible notice to README telling users not to run counters concurrently
- Make a global lock that is taken by
startInstructionCounter
- if the lock is taken, next
startInstructionCounter
can fail with meaningful error message; - or it could just wait until the lock is released, which will sequentialize
cpu-counter
tests - BUT: all of this seems hacky and won't help if you concurrently run non cpu-instruction-counter tests
- if the lock is taken, next
- We can investigate how to actually make it work concurrently:
startInstructionCounter
can return aHandle
that will allow to work with this specific counter, tracking information related to it- we can use
forkOn
to run on specific capability which usually corresponds to a core. It's implementation dependent, but we only work on Linux so it's probably fine- but probably there's a more reliable way to fork onto specific core, I don't know
- I scanned through a manpage and noticed interesting variables like
PERF_SAMPLE_ID
,PERF_FORMAT_ID
,PERF_SAMPLE_GROUP
,PERF_SAMPLE_ID
. I din't look any closer yet, but maybe this can be used for reliably tracking several counters. This stackoverflow question may be related, but I didn't read closely.
I can only be sure about the first option (warn users in the README). In any case, cpu-instruction-counter
is a thing that works only on Linux and uses FFI, so the best practice should be that all instruction counting tests/benchmarks live in separate executable, that's compiled with +RTS -N1
which eliminates the problem.
@zudov Great points, thanks!
Add a visible notice to README telling users not to run counters concurrently
Good idea, I just did it with 077539f.
if the lock is taken, next
startInstructionCounter
can fail with meaningful error message
That sounds like a good idea until we have cleared up how exactly parallel usage behaves.
We probably want to do that locking against what's returned by perfEventOpenHwInstructions
though. It is the one that chooses (in its C implementation) to record events for all threads. It would be legitimate to obtain an event FD that doesn't do that (e.g. one that only listens to events on a particular thread), and then call startInstructionCounter
in parallel on two such event counters.
So I think in general best is to expose both an API that allows you to do everything conveniently from Haskell, and one that's safe to use against common errors (such as accidentally doing parallel perf invocations).
startInstructionCounter
can return a Handle that will allow to work with this specific counter, tracking information related to it
That one I don't quite understand. The perfEventOpenHwInstructions
is what returns such a handle.
we can use
forkOn
to run on specific capability which usually corresponds to a core. It's implementation dependent, but we only work on Linux so it's probably fine
This may not be sufficient in general. What happens if the forkOn
ed f
calls forkOn
itself with another CPU (or just forkIO
)?
the best practice should be that all instruction counting tests/benchmarks live in separate executable, that's compiled with +RTS -N1 which eliminates the problem.
That's not accurate:
Even with -N1
you may have 30 threads running. In -threaded
each safe
FFI call spawns a new pthread, no matter what you give for -N
(see docs).
Only the non-threaded RTS provides the guarantee you're speaking of.