Enhancement Request: Lazy Evaluation of Attributes for OpenTelemetry Spans

Question

Enhancement Request: Lazy Evaluation of Attributes for OpenTelemetry Spans

ianks opened this issue 10 months ago · 3 comments

Currently, the OpenTelemetry span attributes are eagerly accumulated during span creation. This is illustrated in the following example:

MyTracer.in_span('make_a_lot_of_allocations', attributes: { eagerly: array.join("|") }) do 
  # ...
end

In our production SFR application, we frequently create OpenTelemetry spans, but only sample a small fraction of requests to prevent excessive data collection. However, we incur the cost of all attribute hash object allocations regardless of whether the request is sampled or not. This leads to substantial stress on the Garbage Collector (GC), adversely affecting our p90+ request times.

Enhancement Proposal:

I'd love for the SDK to provide a mechanism to lazily generate attributes for the span. This way, we would only incur the allocation cost if the request is sampled for OpenTelemetry.

This could be achieved by adding an API that allows a proc to be passed as attributes, as demonstrated in the following example:

MyTracer.in_span('make_a_lot_of_allocations', attributes: proc { Hash[eagerly: array.join("|")] }) do 
  # ...
end

With this enhancement, the span attributes would only be created at the time of exporting, if needed. This could potentially improve performance by reducing unnecessary GC pressure.

Answer 1 · 2023-12-07T02:53:14.000Z

Thanks for the suggestion. Unfortunately, this is out of scope for OpenTelemetry due to API spec compliance requirements. A mechanism exists to delay computation of attributes until after the span is sampled. This mechanism doesn't work for the SFR service because "sampling" is delayed until the request handler completes (we always create "recording" spans, and buffer them). This is a particularly unusual configuration (it is the only service at Shopify that does this), and only works when the service is a "leaf" on the service graph. This will shortly not be true for SFR, and its sampling will need to change.

Answer 2 · 2023-12-07T16:04:55.000Z

A mechanism exists to delay computation of attributes until after the span is sampled.

I did not know about this, can you expand a bit?

Answer 3 · 2023-12-11T16:03:04.000Z

tracer.in_span('foo', attributes: {'eagerly evaluated' => 42}) do |span|
  span['lazily evaluated'] = compute_expensive_attribute if span.recording?
  ...
end

Here, span.recording? will only be true if the span is sampled "in" by the configured sampler.