live interactive profiler (sampling and tracing)

motivation

sampling is good if your problem is visible when you average over time (e.g. perf and flamegraphs)
tracing is good if you already have a strong hypothesis of what the problem is (e.g. all recv calls are slow, or this one recv call is slow)
- static (compile time or manual instrumentation)
  - it's slow and painful to iterate
- dynamic (run time) e.g. bpf based
  - doesn't have great tooling yet
  - doesn't make use of it's unique advantage: run time dynamism
no existing end user tool does either of these (AFAIK)
- dynamic tracing of user code
- combining sampling and tracing information
extra benefits
- live interaction between user, profiler and running application
- distributed, can collect from many machines to one

sampling with trace information, or decision based on trace information
e.g. separate stacks based on whether a function higher up the stacktraces duration was in the 95th percentile
because every step (sampling, tracing, output) is dynamic (enabled by bpf), you never need to recompile or restart the target application or profiling application
this enables interactive profiling, digging into the problem, e.g. (highest level) sample everything -> sample where high level function in 95th percentile -> sample only when this branch is taken -> (lowest level) trace recv calls at main.cc:1312
it would even allow tracing an if-statement, other branch, or an arbitrary line of code, rather than just function entry/exit. this would be enabled by dwarf debug info integration
rather than fixed time windows (e.g. run for 10 seconds then stop and report), reporting can be streaming (present data from last 10 seconds or an exponentially weighted moving average of all time)

modularise profile.py, funclatency.py, and funcslower.py from bcc-tools
combine profile and funcslower into one script (don't actually need funclatency?)
integrate dwarf line number getter
add state that is reused between them
use dwarf debug info to access local variables (rather than just function arguments)
separate samples based on some high level decision
for any call taking longer than the 95th percentile, show me a flamegraph of all samples inside those calls
generate a difference flamegraph for the samples inside/outside the 95th percentile
automatically transpile dwarf location expressions to bpf c to access variables

Maybe use plotly dash for a graphical frontend
Maybe use kernel density estimation rather than histograms for visualisation and/or automatic discovery
Maybe come up with a gdb/radare2/cutter-like UI for interactive probing of program structure (functions, loops, conditions, basic blocks, lexical blocks, etc)