Vendor-neutral (OpenTracing?) tracing in sidecar processes
bhs opened this issue · 23 comments
This "issue" started life as a github gist. I wanted to publish it somewhere that's friendly to collaboration and cross-referencing on github, and hence the "port" to the github issue format. Without further ado:
Document/Issue purpose
To describe the importance of "general-purpose third-party L7 processes" (e.g., service meshes and sidecars) with respect to distributed tracing, to explain why it's problematic for OpenTracing today, and to outline the steps we can take to make OpenTracing more useful in that layer.
Background
The OpenTracing project aims to allow for vendor-neutral tracing instrumentation of distributed systems. The implicit assumption going in (which is documented in more detail in this blog post) is that the application-level microservice code participating in tracing is compiled/interpreted before it runs in production.
This is complicated by general-purpose third-party L7 processes; historically these were things like nginx and haproxy, though recently they've become more exciting and purpose-built for microservice intercommunication: the term of art is "service mesh", and linkerd and Envoy are leading the way. With the rise of Istio (and to a less-specific extent Kubernetes), service mesh technology promises to become an essential part of the migration to and operation of microservice architectures in our industry, and as such they are an important part of any tracing strategy.
There is some tracing support in these sidecar processes already (linkerd supports Zipkin via a plugin, Envoy supports both Zipkin and LightStep, etc), but it's all bespoke and non-standardized. While OpenTracing seems like a natural conceptual fit here, its focus on programmatic APIs is an impedance mismatch with the problem at hand: even if linkerd or Envoy instruments with OpenTracing, it is not reasonable to expect their users to literally recompile the sidecar process in order to support some new OpenTracing-compatible system.
"Necessary but not Sufficient"
Note that tracing support in these L7 processes is beneficial and high-leverage, but the actual "first-party" application processes / services in between also must be traceable (per the existing OpenTracing scope and charter).
Life of a Traced RPC (in a sidecar)
Let's examine the life of a traced RPC as it passes through a sidecar process on its way from a client to a server and back.
- The RPC arrives, embedded within some sort of envelope that supports key:value metadata (e.g., HTTP/HTTP2 headers). The sidecar decodes both the message and its metadata; at this point it will try to locate and decode any trace context embedded in that metadata.
- The sidecar may create a Span to model its own latency impact on the RPC. If a trace context was found above, we inherit its sampling policy and make it the parent.
- The sidecar forwards the current trace context on to the server along with the application data
- time passes, business value is created, AWS gets paid another microcent, etc...
- On the way back up from the server, all of the above unwinds; if there are errors, they should be set on the Span and reported to the client, etc.
- If it's sampled, the now-completed Span should be added to an internal buffer
Out of band (the particular mechanism is not important:
- Incrementally fill up a Span buffer (perhaps based on a max size, a timer, request state, or something else)
- Flush that Span buffer to an arbitrary external tracing system
How to make the above vendor-neutral
THE TWO IMPORTANT TRACING WIRE FORMATS
+---------------------+ (1) +-------------------------------+
| | app data and | . |
| Application Process |<---span context-->| Sidecar Process . Span Buffer |
| | information | . |
+---------------------+ +-------------------------------+
|
(2)
out-of-band flush of
Span buffer data
|
v
+-------------------+
| |
| Tracing System |
| |
+-------------------+
There are two (or maybe "two and a half") things above that need to be standardized to enable vendor-neutral tracing of sidecars.
In-band trace context encoding: The sidecar must be able to extract the trace id, span id, and propagated state (e.g., sampling bits, perhaps baggage (?)) for the RPC. It would also help to decide on a default header (or metadata map key) name for this context information.
Span buffer encoding: The sidecar generates new Span data of its own, and that data must be buffered and flushed out to an arbitrary tracing system.
The "two and a half" refers to a few remaining details that are not spoken for by these two encodings: specifically how does the Span buffer make it to the tracing system? What are the precise semantics of the sampling bit(s) in regards to sidecar process tracing behavior?
Prior art and requirements
There's plenty of – perhaps too much – prior art. For the in-band context data, there are well-documented formats from Zipkin (B3 and others), AWS X-Ray, OpenCensus, Microsoft, and plenty of others. For the Span buffers, there are multiple versions of Zipkin formats, an AWS X-Ray format, StackDriver Trace's format, and – again – many others.
OpenTracing would prefer not to add an N+1'th format "standard" here, though there are a few important requirements we would like to satisfy. For in-band context encoding:
- Versioning sanity: there must be a built-in forward-compatibility mechanism
- Flexible bit widths for trace and span ids: there's nothing wrong with 64bit, 128bit, or 192bit ids; since they're only used for global uniqueness, we can be more welcoming of various tracing systems through agnosticism on the bitwidth front
- "Standard" "baggage": at a bare minimum, a sampling bit; ideally a record of the sampling frequency, too. These are not usually referred to as "baggage", though semantically they're really similar.
- User-defined baggage (would-be-nice): while not absolutely essential, it's an easy win to propagate baggage (blindly) through a sidecar
For the Span buffer encoding, we would like both a compact (binary) variety and a JSON variety. Semantically, it might look a bit like this or this depending on how one weights various tradeoffs. It would be feasible and perhaps desirable to directly embed the in-band context encoding within each Span in the Span buffer, much like the OpenTracing SpanContext is encapsulated by an OpenTracing Span.
Next steps
- Discuss / debate the content above
- Come up with concrete proposals to satisfy the requirements
- Conditional on the above: converge on one of these concrete proposals (while prototyping support in linkerd and Envoy to verify sanity)
- Conditional on the above: Clean up and merge the aforementioned prototype support
Changelog / edits
- updated to generalize to third-party L7 processes
- clarified that sidecar tracing is "necessary but not sufficient"
- updated to be more general about Span buffer filling/flushing
cc some folks whom I've already spoken with about this in one way or another: @opentracing/otsc @opentracing/otiab @stevvooe @klingerf @mattklein123 @tedsuo @1mentat
Could/should the scope of this be extended to all types of L7 appliances - e.g. L7 load balancers, reverse proxies etc? They might be on separate machines or controlled by the cloud provider but from a tracing perspective they are the same as a sidecar, right?
How does it relate to w3c/trace-context#1 and TraceContext in general?
@mabn TraceContext is a way to address (1) in the diagram. It is probably the format used by OpenCensus.
I am very much in favor of discussing the approaches for standardized formats for (1) and (2), but first I want to address the following:
service mesh technology promises to become ... an important part of any tracing strategy.
I think we should make it clear that service meshes do not magically provide distributed tracing to a set of microservices that are built without distributed context propagation (DCP). To quote Envoy documentation (with my emphasis):
Applications can forward the
x-request-id
header for unified logging as well as tracing.
Reads like a fine print, except that in reality it's not a "can", it's a "MUST", else you get no tracing even with a service mesh. A requirement to forward headers is nothing less but a requirement for DCP.
The three basic steps of DCP are:
- extract the context from the inbound request
- propagate the context in-process (including across thread boundaries, continuations, and callbacks)
- inject the context into the outbound request
Not surprising that OpenTracing API is explicitly designed to do these three things, although in-process propagation is still work in progress (#80). So the irony is that if you want to use a service mesh for tracing because you don't want to instrument the services, you still need to instrument the services with OT API or something equivalent, at least for DCP. Which leads to inevitable question: if the services are already instrumented with OT, then we can plug in a real OT tracer and get the "classic" tracing so why do we need to rely on a service mesh to do that?
This is not to say that having tracing capabilities in a service mesh has no benefits. On the contrary, since a service mesh takes on a lot of functionality, having it describe to the trace what it's doing with the in-flight requests is extremely valuable. There's also the option of not using a real tracer in the services, only a dummy one that merely propagates the context, which allows for an easier deployment of tracing since only the service mesh is exposed to a concrete tracing implementation.
One other thing that concerns me with "tracing via service mesh only" approach is the question of sampling. Not all tracing systems are built to be able to deal with 100% sampling flowing into the tracing backend. Using a simple probabilistic sampling doesn't work well either, especially in large architectures or in services with endpoints having different traffic patterns. In Jaeger we support a variety of sampling strategies and recently put a lot of work into adaptive sampling, all of which requires integration with the instrumentation libraries since the sampling decision is made in the end-user application just before a new trace is started. With tracing via service mesh the key suggestion is that the mesh does not include any custom logic from a tracing system, so it's not clear to me how the tracing backend could affect the sampling decisions.
PS. I propose to keep this issue to the high-level discussion of the requirements and move the discussions about the actual format for in-band and out-of-band data to separate issues. I think common data formats are valuable on their own, regardless of their benefits for service mesh integration.
Haha, service mesh is very popular now. Skywalking has already been doing the works about Linkerd instrumentation. As @bhs said, it already supported Zipkin, but our users want skywalking support it. One of our community member, @hanahmily , the tech leader of dangdang.com (one of the top level e-commerce company), already send a proposal: apache/skywalking#379
Only tracing the service mesh will provide a more easier solution for microservice cluster. But for the trace is unbroken, the applications also must be instrumented. Maybe as @yurishkuro said, a dummy one that merely propagates the context.
, and maybe not, if we care about Garbage Collection, CPU/Momory Cost etc. The service mesh instrument only provides the service metrics, trace(with the helps of dummy app trace).
From the beginning of OpenTracing, OpenTracing is a thin standardization layer that sits between application/library code and various systems that consume tracing and causality data.
+-------------+ +---------+ +----------+ +------------+
| Application | | Library | | OSS | | RPC/IPC |
| Code | | Code | | Services | | Frameworks |
+-------------+ +---------+ +----------+ +------------+
| | | |
| | | |
v v v v
+-----------------------------------------------------+
| · · · · · · · · · · OpenTracing · · · · · · · · · · |
+-----------------------------------------------------+
| | | |
| | | |
v v v v
+-----------+ +-------------+ +-------------+ +-----------+
| Tracing | | Logging | | Metrics | | Tracing |
| System A | | Framework B | | Framework C | | System D |
+-----------+ +-------------+ +-------------+ +-----------+
In-band trace context encoding and Span buffer encoding are implementation related. Is this really OpenTracing should do?
Even we focus on @yurishkuro 's solution, I think that may be already the easiest way, service mesh instrumentation + a context propagation dummy. And all of them use OpenTracing, why not the both of them provided from the same implementation(team, project)? If the same, (1) should not be a problem, the tracer developers do their works to make sure they are compatibility, and (2) is also not a problem, it's their our backend.
For more, I just examine some implementations, which can't fit these two encoding. Please notice, the important things are above discussions ^^^.
For (1), like w3c/trace-context#1, it is a solution, but we must be very carefully. Let's just talk about the traceid, 128bit, 256bit int ids are very common for open source system, but many commercial APM didn't do it in that way. The contents of HTTP head are also different either, OneAPM and skywalking APM have other info except traceId, spanId, sampling, baggage items.
For (2), many APMs didn't use Span Buffer, instead, they are using Part-Trace(skywalking called TraceSegment) buffer, flush the data to the collector when it finished.
I've had some sidechannel feedback including that if you take out the word sidecar or mesh, it feels about the same. If the goal is to look closely, perhaps not constrain it to a smaller audience as intermediaries who do not brand themselves as sidecar or mesh have nearly same problem space.
Personally, I'd like to see if there are more folks involved than the usual in OT before investing energy. Unless this is more diverse than prior work here, it could risk the same behaviors which some have labeled offline to me as the "ot club". Fixing engagement and diversity especially with larger established players seems a prereq to expanding scope further
I just wanted to point out that some of problems and solutions from this issue can be applied to FaaS (function as a service) solutions as well, like OpenWhisk, so, we might want to keep that in mind when discussing the actual standards.
For a FaaS, the situation is a bit better as there's usually a coordinator that invokes the functions, so, it's able to create/propagate the context. It can't, however, automatically inject a context on outgoing remote calls made by functions themselves.
To clarify some of the background..
Linkerd actually supports http and kafka, too via https://github.com/linkerd/linkerd-zipkin. This is a drop-in plugin thing.. people don't need to recompile. We should also note that there is a standard of practice of B3 headers, something that vendors and clones employ today. A number of tools in and outside OpenTracing are compatible with B3 headers produced by envoy, etc. A number of tools in and outside OpenTracing accept zipkin's v1 format over http (exported by both envoy and linkerd). So, even if "all bespoke and non-standardized", there's a standard of practice in the two mentioned meshi which is why they are compatible enough to join traces across (*). This didn't happen by accident, it happened because folks decided to do something compatible. I'm not suggesting these prior formats are what's desired in OT, just I think the intro doesn't make clear that the two meshi are already compatible with trace formats used by applications and with several backends which don't share code.
- hand waving a bit here, as I haven't tried to join traces between the two, just know envoy processes X-B3 and linkerd is made with finagle which defined it. There could be a nuance that prevents them from joining traces.
Thanks for the comments, all.
Could/should the scope of this be extended to all types of L7 appliances - e.g. L7 load balancers, reverse proxies etc? They might be on separate machines or controlled by the cloud provider but from a tracing perspective they are the same as a sidecar, right?
Yes, agreed. I've updated the issue text to reflect this.
Re the "necessary but not sufficient" aspects of tracing sidecars, I updated the issue itself.
Re your concerns about sampling, TL;DR I hear you. At least when the sidecar is not the root of the trace (i.e., when the sampling decision has already happened) it's straightforward to describe a default behavior (i.e., to respect and propagate the sampling decision).
... if the services are already instrumented with OT, then we can plug in a real OT tracer and get the "classic" tracing so why do we need to rely on a service mesh to do that?
In practice, I've seen that service mesh is a nice way to extend tracing "one more edge out" (in the graph model where processes are nodes and RPCs are edges); even if the peer on the other side may not properly support tracing.
PS. I propose to keep this issue to the high-level discussion of the requirements and move the discussions about the actual format for in-band and out-of-band data to separate issues.
Yes, definitely. I intentionally stayed away from proposing solutions here (with a similar intent).
Re "sidecar" vs other types of processes, this is basically the same sort of thing that @cwe1ss and @jpkrohling pointed out (I adjusted some of the language in the issue to generalize a bit).
Re OpenTracing scope: this issue exists because people keep on asking for a clearer answer from OpenTracing in this area... if you don't want to be a part of that discussion, of course that is absolutely your prerogative. If there are others you'd particularly like to involve, send me a list and I'll send them a note.
FYI, Envoy basically does exactly what you're describing re a "context-propagation dummy"... there's a limited Envoy-authored context-propagation library which can handle the simple cases. I would prefer that OpenTracing handle the in-process propagation in the endgame, but at some level it's not required.
Re your point about commercial APMs: yes, that's true. I will change the issue to be less specific about how Spans get out of the process. The point is really just that they're not required to flush synchronously.
Hi,
I'm going to try to mostly leave this conversation to the experts, but FWIW, here is the benefit I see of continuing to go after something like @bhs proposes:
-
It's definitely true that intermediates cannot do full traces without the app forwarding context. However, I would argue that telling app developers to propagate the "x-propagate-me" header is often an easier sell than a full OT/other library integration (this is basically what we do at Lyft). Of course users can use more robust libraries to forward context and add child spans if they want (we also do this at Lyft in some places). So, in answer to @yurishkuro, I do still see a lot of value in having a common standard where intermediaries can form a full trace without the absolute minimum app effort. (This is basically what @bhs just said in his reply about the dummy context).
-
Per @adriancole, it's true that Zipkin JSON is widely supported, and will continue to be in Envoy likely in perpetuity. However, from a perf perspective, I think it would be highly beneficial to develop a common binary ingestion format (whether that be based on gRPC or something else).
-
I still would very much like to agree on a common header name for context as part of this proposal. It's just easier for app writers to roll their own propagation if they know what it is going to be. Again, I agree with @adriancole that this is not required, the app libraries can be configured to do things with arbitrary headers, but it's a nice to have.
Thanks,
Matt
However, I would argue that telling app developers to propagate the "x-propagate-me" header is often an easier sell than a full OT/other library integration (this is basically what we do at Lyft).
That has not been my experience. First, they need to know the exact name(s) of the headers, and let's be frank these things do tend to change over time, standards notwithstanding. Second, they still need to understand how to propagate the context in process, with all the bells of threading, continuations, callbacks, etc. These are the things that most developers (a) don't need to know how to do, and (b) will most likely do incorrectly if they do it manually. So while intuitively "propagate the header X" seems simpler, in practice it is much better achieved by a library (even from the basic code reuse pov). I am very happy that ot-contrib is reaching the point where when some teams at my company come out of the woods with requests "can you support this framework", I can just point them to some ot-contrib module, they change a couple lines in the app (no knowledge of headers), and the context starts to flow through.
Anyway, as I said earlier, I do see a lot of value in standard in-band and out-of-band formats, and if it helps sidecars, only the better.
I would prefer that OpenTracing handle the in-process propagation in the endgame, but at some level it's not required.
re @bhs , I am open for How the opentracing handle the in-process propagation. :) Just think this mechanism is an option one. Because too many commercial and vm-based(java, c#, php,nodejs auto instrumentation agents) agents are not included in application libraries. So OT lib is a bridge for them. By now, I am sure skywalking and instana agents process like this.
from a perf perspective, I think it would be highly beneficial to develop a common binary ingestion format (whether that be based on gRPC or something else)
re @mattklein123 @adriancole , most commercial APM did in that way, and skywalking finished the switch from JSON in 3.2 already. And in this part, I have more concerns about how to define the encoding if considering performance through using gRPC or Thrift or something else, except JSON.
BTW, the AppDynamic even upstreams traces even not in a format of Trace or Span. It's more likely a sub line of trace-tree. This is not an official statement, but I checked the upstream packages from AppDynamic's agent. It is a path separated by |
, including the nodeids, sequence, costs.
@mattklein123 @wu-sheng fwiw I didn't imply we don't make a binary out-of-band format. For example, in zipkin v2 we have a TODO item for a proto definition (similar to how we have thrift defined for v1 today). Incidentally the proto was more driven about the inherit version capabilities than compaction. Regardless, the trace-context working group, which started last year as a google doc, is different than out-of-band formats, and a bit more important than out-of-band formats.
During distributed tracing working group workshops, we found decoupling of out-of-band (like zipkin format) from in-band (like trace-context or B3), is the following: Some APMs and intermediaries will not use another out-of-band format natively, but may support exporting otherwise for interop reasons. Propagation has many things around it which is why it has been our focus since the beginning, and incidentally one without a tracing api agenda. One thing lately, like this year, is that several have mentioned interest in a "propagation" ala baggage, decoupled from tracer impl api, or at least a format. This dove-tails into the existing works we have in progress for the trace-context spec, but independent of it, as this could also push headers like envoy or linkerd's, or a deadline context or a dynamic route etc. Basically there are many more stakeholders in this than OpenTracing and "just tracing" for that matter.
Anyway, we've already planned to sync again on this in Nov. Feel free to join remote, which is easier for you @wu-sheng as we chose an asia-pac friendly time! Obviously, if anything interesting comes out of this, we can talk about that, too.
Anyway, we've already planned to sync again on this in Nov. Feel free to join remote, which is easier for you @wu-sheng as we chose an asia-pac friendly time! Obviously, if anything interesting comes out of this, we can talk about that, too.
I am happy to join your sync, if the time is good, but I am not optimistic for that(Company...you known). But even so, we can discuss after the sync :)
Cool! This is very much along the same thinking that we (Google, others) have had with TraceContext (as @mabn and @yurishukro pointed out, and which we just posted an update to) and Census, which implements TraceContext. Census of course is designed for a slightly more general scenario and can be linked to an app's code, but follows the same general architectural patterns outlined here and is applicable to working as a sidecar as well. We've deliberately kept TraceContext in a different org than Census in the hopes that other groups will adopt it as well.
We haven't talked a whole lot about the OpenCensus project outside of the tracing workshops, mostly due to the fact that the project has been in a rather early stage of development thus far. At its core, the project will provide language specific libraries, each with automatic instrumentation hooks for popular web / RPC frameworks, APIs for interacting with traces and metrics, and exporters to backends for analysis. Census stems from the metrics and tracing instrumentation and tooling that exist inside of Google (Dapper, for which it's used as a sidecar), and it will be replacing our internal instrumentation at Google and the Stackdriver Trace SDKs once it matures. It'll work with any vendor / backend that has an exporter available.
Anyways, glad to see that we're all focusing on the same scenarios, and the value of having a common context format alone is huge, as we've discussed previously. Instrumenting services without linking to their code (either because they're pass through components like load balancers, a vendor-managed API or storage service, or a mesh service being instrumented via a sidecar) is also going to be a big deal going forward, and are part of the reason why we're so focused on making Census available to developers and have been meeting with vendors and projects about the TraceContext format.
Happy to chat more or talk about collaboration if others are interested! I don't want to derail / sidetrack this thread - https://gitter.im/census-instrumentation/Lobby is likely the best place to reach out or you can ping me directly.
A few of thoughts:
There should be a distinction between proxies that can modify content (safely) and ones that cannot.
For proxies that can modify content, they should join that trace as full members, so that they rewrite the appropriate headers injecting their span's context instead of just joining the one that was passed in. Following, as I understand them, the best practices, they should open a server span for when they receive the request and client spans for each "request-like action" that they take outbound, so application level proxy retries are visible as separate spans and separately joinable by downstream servers. Ideally, there should be run time or configure time ability to add additional proxy spans for steps in the proxy process which could potentially affect request handling time for instance downstream selection or header parsing / rewriting.
For proxies that cannot modify content or proxy span injectors (like haproxy-log2span), they should create spans for proxy processes but they will have parentage from whatever the most recent full member injected and will only be included in trees in a best effort sort of way.
I agree that ideally we'd have standard headers to make propagation easier in at least HTTP and ideally SIP and others.
Quick addition question as I look back on comments: Are we trying to specify also cross-platform joins? That seems like a much harder question that how sidecars can participate in any particular implementation's collection / propagation scheme.
Thanks, all, for chiming in! ^^^
In-band data
@mtwo @bogdandrutu @adriancole, looking at https://github.com/TraceContext/tracecontext-spec/blob/master/HTTP_HEADER_FORMAT.md there are two unmet needs I'm aware of:
- varlen ids
- baggage
- (PS: @mattklein123 I know you want a header name standard, and that HTTP_HEADER_FORMAT.md proposes one:
Trace-Context
; works for you?)
I would also like a richer way to record sampling decisions (i.e., probability info as well as the yes/no decision), but the nice thing about a generic baggage mechanism is that it can be used for that. So no new requirements, I guess.
Is there willingness to entertain changes like these in tracecontext-spec? If so, I can put together a PR. If not, I guess (?) a small version range could be reserved for an experimental format??
Out-of-band ("Span buffer") data
Per something @mattklein123 wrote:
from a perf perspective, I think it would be highly beneficial to develop a common binary ingestion format
I strongly agree with this... concretely, a JSON-based format is a dollar-cost non-starter for many production tracing deployments I know well.
Devising a .proto format seems straightforward and doesn't need to start life as part of opentracing/specification
... For instance, there could be an opentracing-contrib/encoding
repository (not attached to the name) that would have .proto files for Span buffers and perhaps an opt-in GRPC service for transmitting them. If it catches on, great; maybe there's reason to consider an OpenTracing scope expansion at that point. If it doesn't catch on, fine; something else will, and when people ask, OpenTracing can point people in that direction instead.
@mattklein123 I know you want a header name standard, and that HTTP_HEADER_FORMAT.md proposes one: Trace-Context; works for you?
Sure!
@bhs we are happy to accept PRs (but I would like to have a short description about what you want to do before if possible, maybe open an issue with your suggestions).
@mattklein123 our own "SpanBuffer" format is defined here (maybe you will find something useful there and use some ideas). https://github.com/census-instrumentation/opencensus-proto/blob/master/trace/trace.proto
+1 for common binary ingestion and context formats, etc.
Hi guys. I'm sorry if I'm late to the party, but reading the thread here leaves one question open.
If there is a plugin, encapsulating Linkerd apps with Zipkin-style b3 headers, what is mechanically different about OT that prevents from implementing same thing with another naming convention?
I think there's even a PR open for that. Is there something preventing the project from "putting the foot in the doorway", adding the naive tracing(non aware of the app, akin to existing approach) and then 1) actually developing the means to do OT-specific context propagation 2) offering it for optional use?