Add support for profiling benchmarks and reporting results to ReBenchDB
smarr opened this issue · 2 comments
Looking at changes in benchmark numbers is unfortunately rarely very insightful by itself.
To understand what benchmarks spend their time on, it would be useful to add support for profiling.
Once upon a time, we had support for it already (for details: #18, #9, code removal: 6e6e251)
At this point in time, I am looking for having support for function profiling of interpreters with perf
, Xcode Instruments, or perhaps Java's Flight Recorder.
Most urgent for me is the ability to profile the executors. One may perhaps also want to be able to profile the benchmarks themselves. Here the difference would be at which level profiling is done. So, at the VM or the application level.
Desired Features
- make profiling information available where we analyze performance
- define profiling commands/parameters for executors and benchmark suites
- add a profiling execution mode or experiment setup to use the profiling commands/parameters
- collect profiling information, extract the basic data and send to ReBenchDB for storage
ReBenchDB Mockup
An integration in ReBenchDB could include a new unfoldable section, which shows the basic profile. In this case, it's showing the result of:
perf record -g -F 9999 --call-graph lbr ./som-native-interp-ast -cp Smalltalk:Examples/Benchmarks/LanguageFeatures Examples/Benchmarks/BenchmarkHarness.som Dispatch 10 0 20
perf report -g graph --no-children --stdio
Once we have the data, we may also want a feature to compare profiles, similar to performance. Depending on the profiling data collected, which may be relative to the overall run time, it might be necessary to consider the actual run time to judge the differences, for instance to avoid showing increase where the overall time actually decreased but the relative parts increase.
Design Considerations
Integration with Benchmarking
For the seamless integration with benchmarking, we need to be able to match benchmark data with profiling data.
This mean, internally, things need to end up having the same RunId.
That is, a specific profiling Run needs to be identified by the command line of the original benchmark Run.
Currently, we use those RunIds also to store data, track progress, etc.
It seems like I should probably leave the handling of RunIds alone.
And also track completion differently, if at all.
One way of doing it would be to have a different way of executing things.
rebench.executor.Executor
works together with the RunScheduler
to identify the runs to be executed, and composing of the final command line.
When composing the final command line for profiling, we need to consider the details for the profiler. This could perhaps be realized as a gauge_adapter?
Though, do I track completion? One way might be simply in a different data store, where only the details needed for completion on tracked, and possibly profiling results.
Machine Setup, Denoise
For benchmarking, we may want to reduce interference, and possible profiling interrupts as much as possible.
For profiling on the other hand, we may want to configure the machine mostly for profiling.
I don't know whether these settings make a practical difference for benchmarks, if no profiling is actually done.
Though, I guess there might be a difference?
So, in the unlikely event that there is, one may want to run benchmarks and profiling with different machine setups.
At the moment, we run denoise at the start, before running benchmarks, and then disable it afterwards. Thus, we don't do it before every benchmark.
To keep it like this, it means, we need to keep profiling and benchmark separate.
But since the benchmarking and profiling configurations likely result in the same experiments, which ReBench currently doesn't handle, this is likely a good idea anyway.
TODO
- add a basic implementation supporting perf to ReBench
- parse data and send compact representation to ReBenchDB
- instead of a stats summary, show perhaps the first 3-4 lines of the profile in the summary after a ReBench run
- add support for other profilers, perhaps just for running. May need ways to define output files, as well as profiler selection
Notes on invoking profiling with other tools than perf:
-
Xcode
xcrun xctrace record --template 'Time Profiler' --output tr2.trace --launch -- /Users/smarr/Projects/FastStart/truffleruby/mxbuild/truffleruby-native/languages/ruby/bin/ruby --experimental-options --engine.Compilation=false harness.rb MicroDispatchBase 200 40
-
Java's Flight Recorder needs the following parameters:
-XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording=delay=10s,duration=10d,name=fr-recording2,filename=fr-recording2.jfr,settings=profile
Compilation Changes
- GraalVM native image compilation may or may not need the some of following arguments:
-H:-DeleteLocalSymbols -g
Some useful links, also to web-based profile inspectors:
- https://www.markhansen.co.nz/profiler-uis/
- https://profiler.firefox.com/
- https://www.speedscope.app/
One may want to keep the raw data of profiles around for inspection in an IDE context, or local tools. Though, for longer-term archival, we probably need to keep more compact information.