jonhoo/inferno

Port stackcollapse-bpftrace

Closed this issue · 7 comments

Given how cool bpftrace is, it seems like a great idea to also port stackcollapse-bpftrace into inferno! Looking at its code, this should be a pretty straightforward exercise, since the eBPF already does most of the collapsing for us, and ustack already emits lines that are nearly in the right format.

Hey @jonhoo,

I stumbled upon this project and I'm not sure I understand it yet, let alone this issue 😬 , but I'd like to contribute.

How should I start and what's the outcome for this?

Hi @felippemr! I'd recommend you start either with the video where this started or read up on flamegraph on your own!

It does look like parsing the output of bpftrace should be pretty easy given the code in stackcollapse-bpftrace.pl. I'm struggling with producing the relevant bpftrace command, though, to produce the kind of file that gets fed to the stackcollapse script (Since I don't really know perl, I think I need to generate some sample input to play around with as opposed to just reading the original code). Do you know what the magic bpftrace incantations are to produce the kind of file stackcollapse-bpftrace.pl is designed to ingest?

To get you started, something like

$ sudo bpftrace -e 'profile:hz:99  { @[kstack()] = count(); }'

There's some example output here, and here's some stuff I just got:

@[
    pipe_poll+5
    do_sys_poll+594
    __se_sys_poll+44
    do_syscall_64+91
    entry_SYSCALL_64_after_hwframe+68
]: 1
@[
    _raw_spin_unlock_irq+19
    finish_task_switch+132
    __sched_text_start+675
    schedule+50
    futex_wait_queue_me+187
    futex_wait+317
    do_futex+711
    __se_sys_futex+312
    do_syscall_64+91
    entry_SYSCALL_64_after_hwframe+68
]: 1
@[
    _nv031488rm+19
]: 2
@[]: 5
@[
    cpuidle_enter_state+185
    do_idle+535
    cpu_startup_entry+25
    start_kernel+1320
    secondary_startup_64+164
]: 89
@[
    cpuidle_enter_state+185
    do_idle+535
    cpu_startup_entry+25
    start_secondary+426
    secondary_startup_64+164
]: 982

(you can also use ustack to get the user part of the stack, but then you should probably also be running some user-space program that has debug symbols and limit to its PID)

A follow-up comment to brendangregg/FlameGraph#201 points out that they're actually adding direct support for the perf output format in bpftrace, so I'm going to close this for the same reason we cut the bpftrace parts from #56.

For interested readers, the hope is that with bpftrace/bpftrace#438 + bpftrace/bpftrace#430 + bpftrace/bpftrace#56, it should be possible to simply run

# bpftrace -e 'profile:hz:49 { @[kstack(folded), ustack(folded)]++ } interval:s:30 { exit() }' | flamegraph > out.svg

The folding happens entirely within bpftrace, and produces a format that is directly consumable by flamegraph (both ours and theirs hopefully).

Hey @jonhoo I lost track of this but thanks for you kind reply and I'm glad things worked!