janestreet/magic-trace

Fix broken stacks traces on Go code

cgaebel opened this issue · 3 comments

In this simple Go example, it's clear that Go's stack switch causes stacktraces to wander off the right hand side of the screen. I think this is easy to fix: when trace_writer.ml, sees the symbol runtime.newstack, it should mark all currently-open stack frames as closed.

I'm not sure if there is any more custom control flow in Go code. e.g. do Go stacks shrink? Please file more bugs if you notice any.

Hi, I just discovered this project and it looks like an amazingly useful tool, and I'd love to get it working better with Go. I haven't tried running myself yet, but some initial notes below:

I think this is easy to fix: when trace_writer.ml, sees the symbol runtime.newstack, it should mark all currently-open stack frames as closed.

runtime.newstack is calling into the scheduler [1] (I think this is the truncated runt... bit under gopreempt_m on #98), which selects a new goroutine to run. The precise moment we switch to the new goroutine stack and start executing is in gogo, so I think this is the best point to mark stack frames as closed. Note the symbol is gogo, not runtime.gogo (runtime.gogo jumps to gogo).

I'm not sure if there is any more custom control flow in Go code. e.g. do Go stacks shrink?

  • Many places within the runtime switch from the "goroutine stack" to the "system stack", which is a fixed sized thread stack where do things that aren't safe on the goroutine stack. We switch back to the goroutine stack before returning to user code. morestack actually switches to the system stack before calling newstack, so it seems this is already working fine.
  • Signal handlers run on yet another "signal stack". This is set up with sigaltstack to apply automatically on signal delivery, so my guess is that it is already handled correctly.
  • Go stacks can shrink. This can either occur in newstack, or concurrently during the GC. In the GC case, we temporarily freeze the goroutine using our standard preemption mechanisms, copy the stack to a small stack, and then allow it to resume with the new SP. It is unclear to me if moving stacks (grow/shrink) will be problematic in and of themselves. Do you track actual SP values to determine when stack frames return?

[1] A bit more background: the newstack call in your example isn't actually growing the stack. Most functions contain a prologue that checks SP against the stack bounds to see if we need to grow the stack by calling newstack. For cooperative preemption, we poison the stack bounds so the prologue thinks it needs to grow the stack, but newstack does the more complex work do decide that no, we really just need to preempt.

Thanks for taking a look at this. It's nice to have someone who actually knows how Go works (as opposed to us) working on this.

To answer your question: magic-trace doesn't track or know anything about stack pointers. The stack frames are entirely inferred by the history of application's control flow (e.g. call, ret, jmp). From your description, it sounds like stack shrinking is uninteresting to magic-trace.

I've put up a draft of a PR that works on a very simple demo program. Thanks for the gogo hint, it was a huge help.

Could you please take a look at that PR and try that version of magic-trace on some interesting go programs you may have lying around?