/FlameGraph

stack trace visualizer

Primary LanguagePerl

Flame Graphs visualize hot-CPU code-paths.

Using DTrace, see: http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/
Using perf_events or SystemTap, see: http://dtrace.org/blogs/brendan/2012/03/17/linux-kernel-performance-flame-graphs/
Using XCode Instruments, see: http://schani.wordpress.com/2012/11/16/flame-graphs-for-instruments/

These can be created in three steps:

   1. Capture stacks
   2. Fold stacks
   3. flamegraph.pl


1. Capture stacks

Stack samples can be captured using DTrace, perf_events or SystemTap.

Using DTrace to capture 60 seconds of kernel stacks at 997 Hertz:

# dtrace -x stackframes=100 -n 'profile-997 /arg0/ { @[stack()] = count(); } tick-60s { exit(0); }' -o out.kern_stacks

Using DTrace to capture 60 seconds of user-level stacks for PID 12345 at 97 Hertz:

# dtrace -x ustackframes=100 -n 'profile-97 /PID == 12345 && arg1/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks

Using DTrace to capture 60 seconds of user-level stacks, including while time is spent in the kernel, for PID 12345 at 97 Hertz:

# dtrace -x ustackframes=100 -n 'profile-97 /PID == 12345/ { @[ustack()] = count(); } tick-60s { exit(0); }' -o out.user_stacks

Switch ustack() for jstack() if the application has a ustack helper to include translated frames (eg, node.js frames; see: http://dtrace.org/blogs/dap/2012/01/05/where-does-your-node-program-spend-its-time/).  The rate for user-level stack collection is deliberately slower than kernel, which is especially important when using jstack() as it performs additional work to translate frames.

2. Fold stacks

Use the stackcollapse programs to fold stack samples into single lines.  The programs provided are:

- stackcollapse.pl: for DTrace stacks
- stackcollapse-perf.pl: for perf_events "perf script" output
- stackcollapse-stap.pl: for SystemTap stacks
- stackcollapse-instruments.pl: for XCode Instruments

Usage example:

$ ./stackcollapse.pl out.kern_stacks > out.kern_folded

The output looks like this:

unix`_sys_sysenter_post_swapgs 1401
unix`_sys_sysenter_post_swapgs;genunix`close 5
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf 85
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;c2audit`audit_closef 26
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;c2audit`audit_setf 5
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`audit_getstate 6
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`audit_unfalloc 2
unix`_sys_sysenter_post_swapgs;genunix`close;genunix`closeandsetf;genunix`closef 48
[...]


3. flamegraph.pl

Use flamegraph.pl to render a SVG.

$ ./flamegraph.pl out.kern_folded > kernel.svg

An advantage of having the folded input file (and why this is separate to flamegraph.pl) is that you can use grep for functions of interest. Eg:

$ grep cpuid out.kern_folded | ./flamegraph.pl > cpuid.svg