/profiler

Primary LanguagePython

live interactive profiler (sampling and tracing)

motivation

  • sampling is good if your problem is visible when you average over time (e.g. perf and flamegraphs)
  • tracing is good if you already have a strong hypothesis of what the problem is (e.g. all recv calls are slow, or this one recv call is slow)
    • static (compile time or manual instrumentation)
      • it's slow and painful to iterate
    • dynamic (run time) e.g. bpf based
      • doesn't have great tooling yet
      • doesn't make use of it's unique advantage: run time dynamism
  • no existing end user tool does either of these (AFAIK)
    • dynamic tracing of user code
    • combining sampling and tracing information
  • extra benefits
    • live interaction between user, profiler and running application
    • distributed, can collect from many machines to one

existing tools

  • sampling (visualisation tools mostly based on perf)
  • tracing
    • jaeger
      • code instrumentation
      • manual, static
    • vampir
      • compiler based instrumentation
      • automatic, static
    • tracy
      • code zone/scope instrumentation
      • manual, static
    • dtrace
      • probe/action based
      • some dynamic probes
      • user code probes are manual, static
    • systemtap
      • probe/script based
      • some dynamic probes
      • user code probes are manual, static
  • sampling and tracing
    • bpf
      • fully dynamic user code probes
        • but quite awkward to implement
        • need to process dwarf debug information yourself
        • no end user facing tool offers this yet
      • possible to combine tracing and sampling
        • using perf events and user probes
        • again, no end user facing tool offers this yet

a new kind of interactive profiling

  • sampling with trace information, or decision based on trace information
  • e.g. separate stacks based on whether a function higher up the stacktraces duration was in the 95th percentile
  • because every step (sampling, tracing, output) is dynamic (enabled by bpf), you never need to recompile or restart the target application or profiling application
  • this enables interactive profiling, digging into the problem, e.g. (highest level) sample everything -> sample where high level function in 95th percentile -> sample only when this branch is taken -> (lowest level) trace recv calls at main.cc:1312
  • it would even allow tracing an if-statement, other branch, or an arbitrary line of code, rather than just function entry/exit. this would be enabled by dwarf debug info integration
  • rather than fixed time windows (e.g. run for 10 seconds then stop and report), reporting can be streaming (present data from last 10 seconds or an exponentially weighted moving average of all time)

dependencies

progress

  • modularise profile.py, funclatency.py, and funcslower.py from bcc-tools
  • combine profile and funcslower into one script (don't actually need funclatency?)
  • integrate dwarf line number getter
  • add state that is reused between them
  • use dwarf debug info to access local variables (rather than just function arguments)
  • separate samples based on some high level decision
  • for any call taking longer than the 95th percentile, show me a flamegraph of all samples inside those calls
  • generate a difference flamegraph for the samples inside/outside the 95th percentile
  • automatically transpile dwarf location expressions to bpf c to access variables

future

  • Maybe use plotly dash for a graphical frontend
  • Maybe use kernel density estimation rather than histograms for visualisation and/or automatic discovery
  • Maybe come up with a gdb/radare2/cutter-like UI for interactive probing of program structure (functions, loops, conditions, basic blocks, lexical blocks, etc)