google/sqlcommenter

URL encoding ambiguities

bdewater opened this issue · 0 comments

I noticed a discrepancy how we go about URL encoding.

  • Python and PHP implementations escape url encoding with %. This is mentioned nowhere in the specification.
  • The Go and Javascript do a simple URL encode.
  • The Java implementation also does not prefix but seems to double URL encode and reverse the order.

This results in three different results for the same traceparent example value of congo=t61rcWkgMzE,rojo=00f067aa0ba902b7:

Language URL encoded traceparent
Python, PHP congo%%3Dt61rcWkgMzE%%2Crojo%%3D00f067aa0ba902b7
Go, JS congo%3Dt61rcWkgMzE%2Crojo%3D00f067aa0ba902b7
Java rojo%253D00f067aa0ba902b7%2Ccongo%253Dt61rcWkgMzE

Consulting the spec to look for answers, but also creates more questions:

Meta characters such as ' should be escaped with a slash .

The way this is written implies multiple characters, but only one is given?

  1. URL encode the value e.g. given /param first, that SHOULD become %2Fparam%20first
  2. Escape meta-characters within the raw value; a single quote ' becomes '

This seems to make the Go/Javascript implementations correct. What threw me off initially as my language of choice (Ruby) already encoded ' as %27 so the second point seemed superfluous. Checking the other implementations, I notice that Javascript encodeURIComponent does not do the same as Ruby and leaves the ' intact... but it also begs the questions why this workaround and not have it URL encoded as well?

What I'd like to see:

  • A list of known URL encoding edge cases and their correct encoding form(s). There is already some listed in https://google.github.io/sqlcommenter/spec/#key-value-format but it contains a copy-paste error on the second row. I think the traceparent example and something like foo'bar are useful additions.
  • The unit tests and readme files for implementations in this repository all using at least this list of cases.