openzipkin/b3-propagation

b3 single header format

codefromthecrypt opened this issue ยท 29 comments

Design moved here https://cwiki.apache.org/confluence/display/ZIPKIN/b3+single+header+format

In designing the Trace Context format, we made a section called tracestate which holds the authoritative propagation data.

This issue defines a value that could be used as a separate "b3" header, and would be the same value used in the w3c tracestate field. Specific to the w3c format, this holds data not in the "traceparent" format, such as parent ID and the debug flag. It would be a completely non-lossy way to allocate our current headers into one value.

In simplest terms it is a mapping:

b3={x-b3-traceid}-{x-b3-spanid}-{if x-b3-flags 'd' else x-b3-sampled}-{x-b3-parentspanid}, where the last two fields are optional.

For example, the following headers:

X-B3-TraceId: 80f198ee56343ba864fe8b2a57d3eff7
X-B3-ParentSpanId: 05e3ac9a4f6e3b90
X-B3-SpanId: e457b5a2e4d86bd1
X-B3-Sampled: 1

Become one header or state field. For example, if a header:

b3: 80f198ee56343ba864fe8b2a57d3eff7-05e3ac9a4f6e3b90-1-e457b5a2e4d86bd1

Or if using w3c trace context format

tracestate: b3=80f198ee56343ba864fe8b2a57d3eff7-05e3ac9a4f6e3b90-1-e457b5a2e4d86bd1

Here are some more examples:

A sampled root span would look like:
b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-1

A not yet sampled root span would look like:
b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7

A debug RPC child span would look like:
b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-d-5b4185666d50f68b

Like normal B3, it is valid to omit trace identifiers in order to only propagate a sampling decision. For example, the following are valid downstream hints:

  • don't sample - b3: 0
  • sampled - b3: 1
  • debug - b3: d

NOTE this does not match the prefix of traceparent, so we must define ours independently and consider that the w3c may change in different ways. This is ok as the "tracestate" entries in w3c format are required to be treated opaque. In other words we can be different on purpose or by accident of drift in their spec.

On positional encoding vs nested key/values

Positional encoding is more space efficient and less complicated to parse vs key/value encoding. For example, the AWS format code in brave is complex due to splitting, dealing with white space etc. Positional is simple to parse and straight-forward to map. Rationale is same as w3c traceparent for the most part.

Different than new specs, we expect no additional fields. B3 is a very stable spec and we are not defining anything new except how to encode it. For this reason, positional should be fine.

On putting mandatory fields up front

The trace ID and span ID fields are the only mandatory fields. This would allow fixed-length parsing for those just correlating on these values. Usually parentid is not used for correlation, rather scraping. Moreover, this is easier for existing proxies who only create trace identifiers.

Ex: you can control trace identifiers without making a sampling decision like so:

# root trace and span IDs
b3: 80f198ee56343ba864fe8b2a57d3eff7-05e3ac9a4f6e3b90

On sampled before parent span ID

When name-values aren't used, it could be confusing which of the equal length fields are the parent. By placing the sampled flag in-between, we make this more clear. Also, it matches the prefix of the current traceparent encoding.

Encoding "not yet sampled"

Leaving out the single-character sampled field is how we encoded the "no decision" state. This matches the way we used to address this (by leaving out X-B3-Sampled).

Encoding debug

We encode the debug flag (previously X-B3-Flags: 1), as the letter 'd' in the same place as sampled ('1'). This is because debug is a boosted sampled signal. Most implementations record it out-of-band as Span.debug=true to ensure it reaches the collector tier.

# force trace on a root span
b3: 80f198ee56343ba864fe8b2a57d3eff7-05e3ac9a4f6e3b90-d

One alternative considered was adding another field just to hold debug (ex a trailing -1). Not only was this less intuitive, it made parsing harder especially as parentId is also optional. This was reverted in openzipkin/brave#773

W3C drift alert

While we should watch out for changes in the TraceContext spec, for example, if they add a "priority flag", we should keep our impl independent. B3 fields haven't changed in years and we can lock in something far safer knowing that.

Why also define as a separate header

We have had continual problems with b3 with technology like JMS. In addition to declaring this format for w3c Trace Context, we could use it right away as the header "b3". This would solve all the problems we have like JMS hating hyphens in names, and allow those who opt into it a consistent format for when they transition to w3c.

openzipkin/brave#584

In other words, in messaging propagation and even normal http, some libraries could choose to read the "b3" header for the exact same format instead of "X-B3-X"

Should we use flags instead of two fields for sampled and debug?

We could encode the three sampled states and debug as a single 8-bit field encoded as hex. If we used flags, an example sampled span would be:

b3: 80f198ee56343ba864fe8b2a57d3eff7-05e3ac9a4f6e3b90-3-e457b5a2e4d86bd1

Notice this is 3, not 1. That's because if using bit field we need to tell the difference between unsampled and no sampling decision. This could be confusing to people.

On flags, in java, we already encode sampled and debug state internally flags like this internally

  static final int FLAG_SAMPLED = 1 << 1;
  static final int FLAG_SAMPLED_SET = 1 << 2;
  static final int FLAG_DEBUG = 1 << 3;

  static Boolean sampled(int flags) {
    return (flags & FLAG_SAMPLED_SET) == FLAG_SAMPLED_SET
        ? (flags & FLAG_SAMPLED) == FLAG_SAMPLED
        : null;
  }

  static boolean debug(int flags) {
    return (flags & FLAG_DEBUG) == FLAG_DEBUG;
  }
}

This might be better off than having two fields, although it is less simple as people often make mistakes coding bit fields, and X-B3-Flags caused confusion many times here including #20.

What about finagle's flags?

If we used flags, we could also do it the same way as finagle does, except I think it would be confusing as the length they allocated (64 bits or 16 characters in hex) was never used in practice.

In practice we could use a single hex character to encode all the flags in our format (that supports 8 flags). Also, using Finagle's flag encoding could further confusion about the "X-B3-Flags" header, which in http encoding never has a value besides "1" (#20). At any rate, we can consider using the first 8 bits of their format as prior art regardless of if we use it.

  /*
   * The debug flag is used to ensure this the current trace passes
   * all of the sampling stages.
   */
  val Debug = 1L << 0 // 1

  /**
   * Reserved for future use to encode sampling behavior, currently
   * encoded explicitly in TraceId.sampled (Option[Boolean]).
   */
  val SamplingKnown = 1L << 1
  val Sampled = 1L << 2

cc @openzipkin/core @openzipkin/instrumentation-owners for feedback

rewrote the description with more examples

Notes on implementation once we settle on things, assuming we decide to introduce a "b3" header in addition to the same value in tracestate format.

  • If you receive a "b3" header, ignore any "X-B3-*" headers
  • If you want to write in brown-field, write both "b3" and "X-B3-*" headers on the way out. In cases where you can't anyway (like JMS), just write "b3"

At least looking at brave, this would be easy because there is one codebase parsing headers. A quick check for a "b3" header would be easy to to do, and be contained in B3 code, which would be independent of any other propagation code anyway. Libraries who have hard-coding around b3 might have more impact. For this reason a tracking issue would need to occur, and similar to our other things such as 128-bit trace ID, we'd expect a long time before all libraries will support this.

posted an email to alert those not watching this repo https://groups.google.com/forum/#!topic/zipkin-dev/EdZjqHXuXsg

+1 definitely

I am positive and agree with putting the X-B3-TraceId, X-B3-ParentSpanId, X-B3-SpanId and X-B3-Sampled into a single fields. SkyWalking has the similar sw3 format

These will make much easier for me to start a SkyWalking new feature, which could generate a new ref in EntrySpan, maybe named OTHER-TRACER-REF, and system=Zipkin.

+1

on the separate sampled, debug vs flags I'm indifferent. I see pros and cons on each

jsw commented

This is great.

Would this new header eventually be considered the preferred propagation format over the existing headers? Would it be reasonable and possible to just use this header in an ecosystem where all applications are using updated tracing libraries?

+1

I'm for this, but I have two concerns:

  1. I'd like to make sure that we have clarifying examples of when some fields can be missing and when they can't. e.g. "Optional fields MUST appear in the correct position. e.g. If x-b3-parentspanid is present, then x-b3-sampled must also be present"

  2. By supporting a new header, we'll eventually be supporting three header formats instead of two: X-B3-*, b3, and trace-state/trace-parent. This imposes a slightly larger burden for brown fields. i don't think that it is much, but it's more than just frameworks. Proxies, load balancers, logging frameworks, etc. may need to handle them as well.

This looks good, although I'm unsure about how to handle some corner case scenarios.

For example, what would I do if I want to specify "no decision" on sampling, but also want to specify parent ID?

Or your force trace on a root span example: 80f198ee56343ba864fe8b2a57d3eff7-05e3ac9a4f6e3b90-1-1. It appears that in this case the fourth slot is the debug flag, but in all other cases it's parent ID. Should it maybe be 80f198ee56343ba864fe8b2a57d3eff7-05e3ac9a4f6e3b90-1--1 (double dash - before the final 1) so that implementers can split on dash - and always look in the fourth slot for parent ID with the caveat that it might be empty since it's optional?

Maybe that's a potential solution for the other issue as well...to specify "no decision" sampling but include a parent ID: 80f198ee56343ba864fe8b2a57d3eff7-05e3ac9a4f6e3b90--e457b5a2e4d86bd1?

I guess my concern is very close to @bplotnick's concern #1, but I think it would be a valid use case to want "no decision" sampling but still want to specify parent ID, and there are probably other mix & match scenarios of the optional fields that should be supported. I think the double-dash to omit an optional field while still allowing you to specify one of the later optional fields could work and wouldn't be too confusing.

I'm also happy if someone thinks of another solution that allows these use cases.

Ah ok, I guess if it's an antipattern that lessens the concern for that use case (it's not something that we do, just looking for corner cases). The force trace on a root span example still nags at me though - I don't like the idea of needing to have branching logic on the fourth position to figure out if it's parent ID or debug flag, and it's ultimately the same issue of wanting to omit an earlier optional field while specifying a later one.

I think I like the ? idea best of all, and it could work for any of the optional fields. Having a ? for any of the optional fields could be equivalent to omitting that optional field. That would then allow you to specify the positional nature of all the fields in a consistent way, no exceptions (third position is always sampled, fourth position is always parent ID, fifth position is always flags). If the positions aren't consistent then you'll have to add caveat wording anytime you mention parent ID or flags, otherwise IMO people will miss the part of the header definition that calls it out and we'll get inconsistent handling.

I don't think sampled should necessarily be required.

I do like putting parent ID after sampled to help reduce confusion with span ID, and ordering fields based on importance. Makes sense. ๐Ÿ‘

I will implement this in Brave as an optional feature. However, it would be used by default in JMS.

In existing practice, it is ok to go without a trace ID, if sampling or debug is set. This was highlighted by @narayaruna at netflix as they have concerns with overhead of ID generation to just propagate a "don't sample" decision. This is also quite important for some messaging use cases.

To make this fully portable, we'll need to accept the following special cases:

  • sampled -> b3: 1
  • not sampled -> b3: 0
  • debug -> b3: 1-1

(again it is invalid to say debug and not sampled, so we won't address 0-1)

One clarification on the parentId being the strange field.. We'd expect messaging propagation to not send parentId at all. RPC spans share span IDs, but messaging spans always fork a new ID (for consumption of message). The parentId of the caller isn't read ever for messaging spans.. it is pure overhead. Plus messaging is the most sensitive to overhead. Long story short is that the parentId in an odd position keeps things more efficient for messaging who never care about parent.

note: w3c recently changed their format including an incompatible definition of flags. Notably, they try to tease out trace requested from recorded in ways that don't quite match either sampled status or debug. https://github.com/w3c/distributed-tracing/blob/master/trace_context/HTTP_HEADER_FORMAT.md Very few people were involved in this decision.

for example, what we called sampled ends up flags '03', unsampled or undecided '00', and debug is not expressible

For this reason, the "prefix matches w3c" part is no longer valid.. and this solidifies the need for b3 to remain its own thing as w3c drifts

Since we don't match w3c traceparent anyway, and it will drift further, I think it is a better optimization to do what we do well. I was thinking of a way to simplify and reduce mistakes by taking advantage of the fact that debug flag is the only flag and it is sampling modifier. "X-B3-Flags" is only valid when sampled and boosts sampling decision to the collector tier.

In other words it is a 4th sampling state (undecided, unsampled, sampled, debug). Instead of having a dangling "-1" for this (ex sampled+debug = "1-1"), we can keep our "hyphens plus hex" and simplify by using only a single character 'd' to indicate debug (knowing debug is implicitly sampled).

So this changes from:
b3={x-b3-traceid}-{x-b3-spanid}-{x-b3-sampled}-{x-b3-parentspanid}-{x-b3-flags}
Ex sampling hint 1-1
Ex root span 80f198ee56343ba864fe8b2a57d3eff7-05e3ac9a4f6e3b90-1-1
Ex child span 80f198ee56343ba864fe8b2a57d3eff7-05e3ac9a4f6e3b90-1-e457b5a2e4d86bd1-1

to:
b3={x-b3-traceid}-{x-b3-spanid}-{if x-b3-flags 'd' else x-b3-sampled}-{x-b3-parentspanid}
Ex sampling hint d
Ex root span 80f198ee56343ba864fe8b2a57d3eff7-05e3ac9a4f6e3b90-d
Ex child span 80f198ee56343ba864fe8b2a57d3eff7-05e3ac9a4f6e3b90-d-e457b5a2e4d86bd1

It makes parsing a lot easier when we limit to the existing choices of absent, 0, 1 or d. I think the intuitiveness is worth it. Also, folks who don't care about the parent ID, they can ignore the last field completely and not miss debug flag.

Let's take all the examples and translate them. I've highlighted the ones I believe are clearer (and easier to parse)

Propagate a root (or non-shared) span with no decision yet:
Was b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7
Still b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7

Propagate a root (or non-shared) span with a sampled decision:
Was b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-1
Still b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-1

Propagate a root (or non-shared) span with an unsampled decision:
Was b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-0
Still b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-0

Propagate just unsampled decision:
Was b3: 0
Still b3: 0

Propagate a root (or non-shared) span with a debug decision:
Was b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-1-1
Still b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-d

Propagate a RPC child span with a debug decision:
Was b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-1-5b4185666d50f68b-1
Still b3: 4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-d-5b4185666d50f68b

As you'll notice... we can look at 3 fields (especially in messaging which never shares span ID) and get all info we need. Since this is not a bit field, there's no risk in this drifting.. wdyt?

cc @openzipkin/core and everyone else here of course

here's a concrete implementation that switches to "d is for debug" and results in simpler code as well openzipkin/brave#773

Note: I speculatively updated the description for 'd is for debug'. If it turns out unpopular, I will switch it back.

@adriancole As long as sampled and debug flags are not bit fields, and that's clearly spelled out in docs, I think this is ok. I like the reframing of it to think of debug as just another sampling state, so in that sense it is logical for it to be encoded in the same sampling positional-field. It also removes confusion around "what if sampling is undecided or zero, but debug is on?" type questions.

Makes sense for parent to end up as the last field as well.

Disclaimer: we don't do much of anything with either sampling or debug flags, so I'm not speaking from a place of experience. But what you're proposing does make sense to me.

So far I see 6 thumbs up by people who don't work at a common employer (besides myself). Will move to formalize this

Added a bunch of issues to track support of at least parsing this.

finally got around to the pull request #28