Port difffolded.pl
jasonrhansen opened this issue · 9 comments
Now that #60 has landed, it would be nice if inferno could generate the differential profile files that this feature expects. Currently you need to use difffolded.pl from upstream FlameGraph. We should port this to Rust.
See Differential Flame Graphs for more info.
I just realized that the output of difffolded.pl
is not sorted. This is not a problem for the Perl version of flamegraph since it always sorts the input lines. But in inferno's implementation of flamegraph the lines are only sorted if there are multiple input files that need to be merged. The assumption was that if only one file is passed, the input will already be sorted. That's not the case when creating differential flamegraphs.
We could deal with this in one of the following ways:
- Diverge from upstream and have our version of diff-folded sort lines before writing them out.
- Always sort the lines in flamegraph like the Perl version does it.
- In flamegraph, detect early if it's a differential and only then sort the lines.
Thoughts?
Hmm, that's a good point... I'm tempted to do (1), but that would mean that we're not compatible with files generated by upstream's difffolded.pl
, which would be sad. Unfortunately, (2) is also pretty dissatisfying as it would mean you need to read the entire input before you can process any of it. I like (3), but I worry that not doing (2) means that there will be weird cases where the user has non-sorted folded output and passes it to us expecting it to just work.
Amidst all these, I think (2) is sadly the right way to go. It will likely give us quite a performance hit since we can no longer do streaming work. Quantifying that would be good (shouldn't be too hard now with #88). At least we can use sort_unstable
, so there's that 🤷♂️
Yeah, it's sad to have to always sort even when we know most of the time the input is already sorted, but I do agree that it would be best to remain as compatible as possible with upstream.
I'll do some benchmarks to see what the performance difference is.
I think it's not even so much the sorting itself, but the collect
we have to do in order to sort.
Yeah, I get that. Here are the benchmarks.
No sorting:
time: [5.3512 ms 5.3765 ms 5.4020 ms]
thrpt: [114.13 MiB/s 114.67 MiB/s 115.21 MiB/s]
Sorting:
time: [5.7031 ms 5.7364 ms 5.7740 ms]
thrpt: [106.77 MiB/s 107.47 MiB/s 108.10 MiB/s]
change:
time: [+7.5874% +8.3936% +9.3059%] (p = 0.00 < 0.05)
thrpt: [-8.5136% -7.7436% -7.0523%]
Performance has regressed.
Okay, so not quite as bad as I'd thought then.
I'll add a PR for the flamegraph benchmark since #88 only added benchmarks for collapse.
That seems like an excellent plan! What input are you using?
tests/data/collapse-perf/results/example-perf-stacks-collapsed.txt