w3c/trace-context

Discourage vendors from validating other vendors' tracestate entries

justinfoote opened this issue · 4 comments

Section 4.3.3 of the spec says:

The vendor MAY validate the tracestate header. If the tracestate header cannot be parsed the vendor MAY discard the entire header. Invalid tracestate entries MAY also be discarded.

The OpenTelemetry implementations have interpreted this in a way that is problematic for us.
In opentelemetry-dotnet, the entire tracestate is discarded it contains any invalid entries.
In opentelemetry-java, both tracestate and traceparent are discarded if tracestate contains any invalid entries!

Because of OpenTelemetry's interpretation of this clause, I believe every vendor will be forced to validate the entire tracestate payload (or risk OpenTelemetry restarting their traces), and I would very much prefer not to spend the resources parsing other vendors' tracestate entries.

I know we're too late for the first version of this spec, but I'd really like us to revisit this clause for the next version. Instead of a MAY, I'd like this changed to "SHOULD NOT validate the tracestate header".

I think discarding invalid tracestate entries at the very least is not a bad idea. If I propagate bad entries blindly I run the risk of the next system I call discarding mine, sending entries that are too long, or sending entries that break something.

This issue was discussed in the bi-weekly W3C Distributed Tracing working group session today (Feb 25, 2020):

  • Because the tracestate value may have been received through a medium other than HTTP headers, it's possible for unparsable characters (like control characters) to appear in it.
  • Downstream systems may not handle these characters at all, or their behavior when receiving these characters may be unknown. Passing them on risks breaking requests.
  • Even if we change the spec to a SHOULD NOT validate, we still need to consider what a compliant system should do if it encounters a truly unparsable character.
  • It is preferable to drop invalid tracestate entries, rather than dropping the entire tracestate header value.

In the next version of the spec, we should consider changing this to recommend dropping invalid entries from tracestate, rather than dropping the entire tracestate header.

please provide concrete examples of control characters. this would break many other headers in routine use. It should be the responsibility of those copying from another medium to not add control characters. having a specification that says don't validate it doesn't make sense.

allowing invalid also makes it hard to read your state entry if it appears after an invalid one, which would be the case if people intentionally allow corrupt data, and then move that to position zero

Allowing invalid characters while asking to preserve "correct" entries will require to specify which specific characters we can treat as an acceptable erroneous entry that can be dropped individually. I'd vote for keeping the language of the specification and make systems drop the entire tracestate in these situations.