golang/go

runtime/trace: execution tracer overhaul

mknyszek opened this issue · 29 comments

Execution tracer overhaul

Authored by mknyszek@google.com with a mountain of input from others.

In no particular order, thank you to Felix Geisendorfer, Nick Ripley, Michael Pratt, Austin Clements, Rhys Hiltner, thepudds, Dominik Honnef, and Bryan Boreham for your invaluable feedback.

Background

Original design document.

Go execution traces provide a moment-to-moment view of what happens in a Go program over some duration. This information is invaluable in understanding program behavior over time and can be leveraged to achieve significant performance improvements. Because Go has a runtime, it can provide deep information about program execution without any external dependencies, making traces particularly attractive for large deployments.

Unfortunately limitations in the trace implementation prevent widespread use.

For example, the process of analyzing execution traces scales poorly with the size of the trace. Traces need to be parsed in their entirety to do anything useful with them, making them impossible to stream. As a result, trace parsing and validation has very high memory requirements for large traces.

Also, Go execution traces are designed to be internally consistent, but don't provide any way to align with other kinds of traces, for example OpenTelemetry traces and Linux sched traces. Alignment with higher level tracing mechanisms is critical to connecting business-level tasks with resource costs. Meanwhile alignment with lower level traces enables a fully vertical view of application performance to root out the most difficult and subtle issues.

Lastly, the implementation of the execution tracer has evolved organically over time and it shows. The codebase also has many old warts and some age-old bugs that make collecting traces difficult, and seem broken. Furthermore, many significant decision decisions were made over the years but weren't thoroughly documented; those decisions largely exist solely in old commit messages and breadcrumbs left in comments within the codebase itself.

Thanks to work in Go 1.21 cycle, the execution tracer's run-time overhead was reduced from about -10% throughput and +10% request latency in web services to about 1% in both for most applications. This reduced overhead in conjunction with making traces more scalable enables some exciting and powerful new opportunities for traces.

Goals

The goal of this document is to define an alternative implementation for Go execution traces that scales up to large Go deployments.

Specifically, the design presented aims to achieve:

  • Make trace parsing require a small fraction of the memory it requires today.
  • Streamable traces, to enable analysis without storage.
  • Partially self-describing traces, to reduce the upgrade burden on trace consumers.
  • Fix age-old bugs and present a path to clean up the implementation.

This document also describes the existing state of the tracer in detail and explains how we got there.

Design

Link to design document.

CC @felixge @nsrip-dd @prattmic @aclements @rhysh @dominikh @bboreham @thepudds

Change https://go.dev/cl/503038 mentions this issue: design/60773-execution-tracer-overhaul.md: add design

Change https://go.dev/cl/494187 mentions this issue: runtime: add execution tracer v2 experiment

Change https://go.dev/cl/515635 mentions this issue: runtime: refactor runtime->tracer API to appear more like a lock

Change https://go.dev/cl/538515 mentions this issue: cmd/trace: Add support for v2 traces

Change https://go.dev/cl/540256 mentions this issue: internal/trace: implement MutatorUtilizationV2

Change https://go.dev/cl/540778 mentions this issue: runtime: enable the exectracer2 experiment by default

Change https://go.dev/cl/541695 mentions this issue: internal/trace/v2: don't enforce batch order on Ms

Change https://go.dev/cl/541696 mentions this issue: internal/trace/v2: resolve syscall parsing ambiguity

Change https://go.dev/cl/541257 mentions this issue: cmd/trace/v2: add goroutine analysis pages

Change https://go.dev/cl/541259 mentions this issue: internal/trace/traceviewer: make the mmu handler more self-contained

Change https://go.dev/cl/541258 mentions this issue: cmd/trace: common up the mmu page and add it to cmd/trace/v2

Change https://go.dev/cl/541999 mentions this issue: cmd/trace/v2: add support for pprof endpoints

Change https://go.dev/cl/541998 mentions this issue: cmd/trace: refactor pprof HTTP SVG serving into traceviewer

Change https://go.dev/cl/542076 mentions this issue: internal/trace: add task analysis for v2 traces

Change https://go.dev/cl/542077 mentions this issue: cmd/trace/v2: add support for task and region endpoints

Change https://go.dev/cl/542001 mentions this issue: cmd/trace: factor out durationHistogram

Change https://go.dev/cl/542218 mentions this issue: cmd/trace: add basic support for v2 traces

Change https://go.dev/cl/543019 mentions this issue: internal/trace/v2: redefine NoTask and add BackgroundTask

Change https://go.dev/cl/543595 mentions this issue: cmd/trace/v2: add support for goroutine filtering

Change https://go.dev/cl/543596 mentions this issue: cmd/trace/v2: add support for a task-oriented procs-based view

Change https://go.dev/cl/543695 mentions this issue: cmd/trace/v2: add thread-oriented mode for v2 traces

Change https://go.dev/cl/543937 mentions this issue: cmd/trace/v2: add support for the goroutine-oriented task view

Change https://go.dev/cl/543995 mentions this issue: cmd/trace/v2: emit regions in the goroutine-oriented task view

Change https://go.dev/cl/546026 mentions this issue: doc: add release notes for the new execution tracer

@mknyszek the Future work section of the proposal mentions the following:

Each event has a single header word that includes the event type (8 bits), space for a 4-bit event reason, and the timestamp delta (20 bits). Note that every event requires a timestamp. At a granularity of 64 nanoseconds, this gives us a timestamp range of ~1 second, which is more than enough for most cases. In the case where we can‘t fit the delta in 24 bits [...]

Does the timestamp delta have 20 bits or 24? At 20 bits and 64 ns resolution, it'd only be 67 ms, not 1s.

Furthermore, I'm not convinced that 64 ns granularity is sufficient. In a trace produced by go test net/http I am seeing delta times as short as 1ns. For example, for a gcBgMarkWorker goroutine, I am seeing the sequence Runnable, Running, Waiting, Runnable, Running, with the following durations for each state: 6ns, 2ns, 4ns, 3ns, 45ns. That entire sequence of events would get squished down to the same timestamp at 64 ns resolution. Although I'm not quite sure how we transitioned from Running to Waiting in just 2ns.

@dominikh Unfortunately that document is out of date; I plan to rewrite a good chunk of it in the new year. There are things I didn't get to, and things that had to be implemented differently. (The core goals and ideas still apply, but the implementation was different, generally to avoid unforeseen limitations and/or bugs in the proposed changes.) Timestamps are encoded the same way they were before, but the new timestamp resolution is indeed 64 ns. The parser will avoid equal timestamps by incrementing by 1 ns; it will do the same thing if timestamps are out-of-order, which is why you see these odd-looking deltas.

Increasing the resolution is easy, it's just one constant and changing it is backwards compatible. If you're finding that 64 ns is not good enough, we can change it.

Change https://go.dev/cl/576256 mentions this issue: runtime, cmd/trace: remove code paths that include v1 tracer

Change https://go.dev/cl/576257 mentions this issue: runtime: rename v2 execution tracer files

Change https://go.dev/cl/576258 mentions this issue: internal/goexperiment: remove the ExecTracer2 experiment