tokio-rs/tracing-opentelemetry

Delayed span submission for request root spans and children until trace parent information can be extracted from request.

taladar opened this issue · 6 comments

Feature Request

Often the trace parent information is only available after some processing steps, e.g. in HTTP you might first accept the TCP connection, then possibly do a TLS handshake, then create some per connection state objects and only then parse the HTTP headers with the trace parent.

It would be useful if there was a way to delay the submission of the request root span and all its child spans until that point so it can be submitted with the correct trace id from the trace parent header or similar information in other protocols or until we can explicitly tell it that there is no trace parent information in the request.

It might also be useful to have an easy way to access the (request) root span from the current span.

Motivation

Currently the alternative is setting the trace parent on the span that is the current span at the time that information becomes accessible but that cuts off all the previous parents on one hand and also attributes some more time to the process sending the trace parent header because the timing information from those previous parents is not shown as part of the distributed trace.

Proposal

The drawback would be some more complexity in the submission logic if it had to delay some spans and of course those spans would not be available at all if the code never reaches that point where it tries to extract trace parent information from the request.

Alternatives

The OpenTelemetry project could create a message that links two existing trace ids so they are displayed as one trace anyway. Considering how many implementations there are the use of that would likely take a while to spread to a state where that would become useful.

The Otel Trace Spec does not really have the ability to adjust or delay span creation, the closest seems like linking the request span as a span that follows from the tcp/tls spans so visualization systems could connect the two traces via Span::follows_from.

That being said this crate already makes some changes to the spec by allowing you to update span data dynamically as it effectively just pairs an otel span builder with a given tracing span and both creates/submits the otel span on close. You could do something similar, but you would need to have a way to buffer the span data you want to update/export.

It could be easier to use the otel api directly instead of the tracing api, and maintain a vec of span builders that you could submit whenever you know the trace id? Or if you wanted to keep using the tracing api for the buffered spans you could add an API to "extract" the span builder stored in OtelData so you could manually submit it later.

I have tried Span::follows_from but it doesn't really do anything very visible in the tools I have tried. Mostly it just adds a link somewhere deep in the span to the span in the other trace.

I guess what I would need is a way to hook into the export process between the tracing spans and the actual exporter to e.g. OTLP to allow me to delay all spans of the trace id of a request root span until some later part of the program can set the trace parent from the parsed header and thus replace the trace id with the one from the trace parent header in all spans and sub-spans created already (or confirm that there is no header for that request) and unlocking their export.

Apart from this problem the tracing API is really quite convenient and I would like to avoid having to reimplement everything it provides.

Is keeping the root span around and then setting its parent directly not an option?

It souds to me like what you're after is a way to iterate through a span's parents to the current root span and setting its parent to the context received through HTTP, right? Basically something like Span::set_root_parent(&self, ctx: Context)?

I have actually tried keeping the parent around earlier today. That works for the parent but all the existing child spans for that parent seem to use the old trace id. I haven't quite figured out how the trace id is propagated yet but I assume that happens at span creation time?

Yeah, you're right. So for your use case the trace ID should be propagated recursively to all children of that span, right?

I'm not sure if that's even possible as some of the spans may have already been closed and thus sent through otel.

Yeah, or at the very least it would be nice if the spans had a way to set trace id (and other state transmitted by the traceparent/tracestate headers) if I keep them all around manually.