Valid baggage keys

Question

Valid baggage keys

james-callahan opened this issue 7 years ago · 5 comments

Background

Baggage keys are currently specified to be "a string". However baggage often needs to be transmitted in places that don't support a clean string namespace.
e.g. when transmitted as an http header (with uberctx-mykey) the key cannot contain a colon.
This can get more complex if baggage keys go through unicode or case normalisation.

Proposal

Either the domain of keys needs to be reduced (e.g. mandate lower-case keys) or the full string-space needs to be called out so that encodings (such as the uberctx http header prefix) don't miss an encoding/escaping step.

Answer 1 · 2018-05-02T02:22:06.000Z

I consider uberctx-{key} scheme to be, in retrospect, a mistake, we could've easily achieved the same goals with uberctx: key=value format, which is also being proposed for w3c trace context. At Uber we're stuck with the uberctx-{key} format now as changing it requires upgrading client libs in 1000+ applications, which is ... well, you know. So our internal guideline is that keys can only be alphanum-snake-case (which in practice is perfectly acceptable).

It doesn't mean we cannot solve this problem in Jaeger, we could either introduce encoding for keys that are not alphanum-snake-case, or we can implement different codecs for a format that's similar to w3c.

Answer 2 · 2018-05-02T02:51:53.000Z

Looking at the w3c spec:

Name starts with the beginning of the string or separator , and ends with the equal sign =. The contents of the name are any url encoded string that does not contain an equal sign =. Names should intuitively identify a the tracing system even if multiple systems per vendor are present.

So the baggage key space is reduced to any string that can be encoded in url encoding (which is all of them?)

I assume the language here is indicating that the equals sign itself should be encoded as %3D and a comma as %2C (or are they banned entirely)?
I'm not sure if the null byte is allowed?
The spec doesn't say it isn't, but I wouldn't trust browsers/libraries to handle it well
What happens to unpaired unicode surrogates? (UTF-8 vs WTF-8)

Answer 3 · 2018-05-02T13:15:10.000Z

I hope the null byte isn't valid but I might have to handle that too: https://github.com/isaachier/jaeger-client-c/blob/master/src/jaegertracingc/key_value.h#L34-L37. Other than C, most languages handle that gracefully.

Answer 4 · 2018-05-03T01:22:54.000Z

@isaachier one incompatbility of treating them as 8bit (minus null byte) C strings is that you would allow invalid UTF8; while e.g. javascript would need valid unicode (but allows unpaired surrogates)

Answer 5 · 2018-05-03T01:47:48.000Z

I have an encoding method in that code too, but this all assumes the null byte is guaranteed to terminate a string (i.e. no need to maintain length).