w3c/trace-context

Proposal: Add a "propagation-only-parent" flag to be set to true if parents is a no-tracing service

bogdandrutu opened this issue · 6 comments

Background

Starting from the definition about the "minimum" interaction:

At a minimum they MUST propagate the traceparent and tracestate headers and guarantee traces are not broken. This behavior is also referred to as forwarding a trace.

Let's assume we have the following scenario, where 3 services are call during a request A -> B -> C (Service A calls Service B which calls Service C), and assume that the owner of the Service B wants tracing disable so they will do the "minimum" interaction defined above. With this it will look like Service A -> Service C (service A calls Service C, which is not really the reality).

Proposal

Add some information into the trace-flags about the fact that the request recently passed a Service that has not participated in the trace. For this we can define a "propagation-only-parent" flag bit, that has the following behavior:

  • If a Service was doing the minimum propagation only before, they will set this bit to 1 when they send the header to the next service to signal that the parent was a propagation only.
  • If a Service participates into the trace they will set this bit to 0 to signal that the parent is not a propagation only.

How does this help?

Knowing that the parent participates or not into the trace is a critical information that can be used by the backend when showing parent child relationships and to inform the user about the correct connection between the services. In this case what the backend may do is just to inform the user that there are intermediate services that are not participating in the trace (backend cannot know about how many, what other services are doing since they decided to not participate in the trace).

Questions

  • Do we need an extra bit to represent if the "propagation-only-parent" bit is set or not?

My concern with this is what it means to be a "service". Is istio sidecar a service? Should it set this bit? If it doesn't, how about a standalone proxy like nginx?

I do agree that in some scenarios like sidecar maybe skipping this is the right thing, but if we have a "standalone service" which is not a proxy/lb/sidecar, would be good to know this info.

Happy to have these kind of comments/recommendations. Also maybe making this an "optional" flag, and service owner can decide to set this or not based on the fact that the service type would make sense. So not having the flag set it does not mean that your parent is traced but that the parent is a "relevant" service (or how we call it).

I do agree that there are caveats, but I do also see a gap here, and would like to have a solution that at least works for real cases like we have in my current company :).

Sorry for the late reply. We discussed this in a working group meeting. It would be helpful to describe possible use cases in more detail. There is a description about that in the "How does this help?" section already, but so far it remains a bit vague.

Knowing that the parent participates or not into the trace is a critical information that can be used by the backend when showing parent child relationships and to inform the user about the correct connection between the services. In this case what the backend may do is just to inform the user that there are intermediate services that are not participating in the trace (backend cannot know about how many, what other services are doing since they decided to not participate in the trace).

So the only information a downstream service can get from this is whether or not there were any intermediate services, but no additional context or information. How would an observability backend use this information? How is displaying this helpful for users of an observability product? Do you have examples of real world use cases for that information?

and would like to have a solution that at least works for real cases like we have in my current company :).

Can you expand on that real world use case?

This can be achieved by utilizing tracestate. Also tracestate approach is better (as discussed today at the meeting) in another use case where "proxy" service retries and want to append the retry # into tracestate without changing traceparent. Multi-value tracestate key/value pair will work better than a boolean flag.

This can be achieved by utilizing tracestate. Also tracestate approach is better (as discussed today at the meeting) in another use case where "proxy" service retries and want to append the retry # into tracestate without changing traceparent. Multi-value tracestate key/value pair will work better than a boolean flag.

Feedback:

This will not work for us unless the tracestate is somehow standardized as well. Even for the number of retries what is the "key" in the tracestate used? Same here, what is the key used for this? How do a proxy provider like envoy agree with a service provider like Snowflake on that key?