Frame time distribution reporting?
Opened this issue · 2 comments
In soft real-time and interactive uses, performance consistency is arguably more important than mean performance.
60FPS with constant frame times is way better than 200FPS with a lag spike every couple hundred frames. Apparently, even some quite large games and GUI applications can have quite bad issues with the entire process freezing every couple seconds when the garbage collector kicks in.
Complication: Nim has multiple memory management strategies that can be used. I don't know if some of the other languages might too. I suppose these would probably be best exposed as run_arc.sh
, run_boehm.sh
, etc., similar to how there's already run_speed.sh
, run_mypyc.sh
, run_lto.sh
, etc.
Obviously results from this should be taken with a grain of salt the size of a boulder, as small implementation details could completely change them. But still, I could see it as an at-a-glance useful way to judge what you're likely going to be able to get out of each language if you jump in without taking special measures, for which I have not yet seen any previously existing empirical data or comparison.
Possible components:
- Output format? Maybe just dump a JSON array of frame times to disk without any additional processing.
- Option to sample only after a warm-up period? So skip startup, and give JITs time to get going. Hm; I guess if it's implemented as something like
--frametimes 1000
meaning "Keep only the last 1,000 frame times," then we can also avoid having to ever reallocate for the internal frame time array. (And then you could do, E.G.--frametimes 1000 --frames 1200
to get 1000 frame times after 200 warm-up frames.) ./utils/frametimes.py <LANG>/frametimes.json
to bin the times and show a MatPlotLib histogram?- Would it be worth it to make
bench.py
report on deviations/quantiles when available too?
I considered PR'ing this directly or trying to start it myself, but I suppose there's enough different ways it could be done and reasons to do or not do it, and I'm not proficient enough, that it should probably be discussed more thoroughly.
This is an interesting idea, and yeah lots of different possibilities to think about... Having different run scripts for different compiler flag sets seems sensible
(One random thought - might be simpler to have each emulator just log a timestamp for each frame (with --debug-clock
?), and then have the post-processing utility-script do the logic like "ignore the first ${warmup period}"?)
(One random thought - might be simpler to have each emulator just log a timestamp for each frame (with
--debug-clock
?), and then have the post-processing utility-script do the logic like "ignore the first ${warmup period}"?)
Concern with that is that logging arbitrary numbers of frames means you probably need either (1) a dynamically resizable array (C++ vector
, Python list
, etc.), which will be bad and wildly different across languages in terms of expensive reallocations and copies (if even available), or (2) IO dump to a stream or file, which will probably be worse.
Having the postprocessing script handle logic like ignoring warmup would be semantically cleaner in most cases. But I also think the right way to implement the performance-sensitive logging would be with an explicitly specified fixed size buffer anyway. Otherwise, there's no way to tell whether you're really getting lagspikes or your list is just being resized on a particular frame.
So I think ignoring warmup is basically a special case for postprocessing that we conveniently get for free because we already have --frames
and we already need a fixed-size logging buffer anyway.