tracestate response header proposal
Opened this issue · 4 comments
One issue that is frequently raised with response headers is the absence of a mechanism similar to tracestate which can be used to return tracing-system-specific information to the caller. As an example, in a multi-tenant tracing system it may be desirable to return the tenant ID with the trace ID and span ID so that the caller knows which tenant to use to look up that trace.
Problem with tracestate response
The primary issue preventing the implementation of a tracestate response header in the past has been the inability to reliably merge tracestate response headers. For example, if node A calls node B and node C which both return a tracestate in the response, how does node A handle conflicts between the tracestate response from B and C if the requests to B and C both return key foo
with different values?
Proposal 1 - Do not backpropagate tracestate
Given the above example, node A would not return any tracestate values from the responses from node B or node C. In this case, the support use case described above still works because a customer of A can include A's tracestate response in a support request. A can provide any data in the tracestate it needs in order to look up the trace (cluster id, tenant id, etc).
This is the simplest implementation because the tracestate
header can be a completely opaque value. A requester would not receive any tracestate from any of its grandchildren in the trace, but it still opens the possibility for many supportability use cases. A "proxy mode" or similar would also be desired in this case in order for a load balancer, firewall, proxy, or similar system to return the tracestate returned by the target unaltered.
Proposal 2 - Treat tracestate as a set of tokens
Given the above example, treat all tracestate entries as an opaque token (e.g. a=1
is a distinct token from a=2
). In this case, node A would concatenate the tracestate responses from B and C, and optionally prepend its own token, then make a pass through the full tracestate which removes all duplicate tokens before including it in the response.
This is slightly more complex to implement as it requires backpropagating the tracestate from children to parent. It would also possibly grow the tracestate quite large, although reasonable limits could be applied. It also raises the question of how to order the tokens from B and C.
It is worth pointing out that Jonathan Mace's "baggage" paper advocated for implementing the merge semantics (or option 2). However, it does feel like a higher burden on implementers (many implementations propagate the context forward as immutable). I like option 1. Maybe in the future we could introduce different levels where both options are possible but option 2 is not required.
it wouldn't be too hard to write the spec such that we do proposal 1 for now but leave proposal 2 open for the future. For example we could specify it as a comma separated list but ignore all entries after the first in version 1
I would like to revive this proposal for level 3. Specifically, I believe proposal 2 to be the more useful. I agree with Yuri that it does complicate implementation, but I believe the usefulness outweighs the benefits. Option 2 is required if there is any middleman like a proxy, or routing service, or other the header will be lost.
The MAY
semantic on back propagation sounds like a good plan. A few thoughts:
- I wonder if backpropagation opens door for any new information exposure problems. With forward propagation one can inject a cleanup filter on incoming request that will ensure that tracestate is cleaned up before any outgoing calls. With back propagation it is unclear at what stage this clean up filer must be set. Do all frameworks support this type of callback to clean up the whole tracestate on response?
- What to do with async calls? If a request initiates some background actions, does framework needs to collect all tracestates, even though neither of them will ever be used? With forward propagation it is a constant set of headers that will be stored and sent to background tasks forewer. With back propagation the
tracestate
potentially will grow with no use.