python-hyper/hyperlink

Excessive escaping of "=" in query string parameter values

mahmoud opened this issue · 6 comments

As a more specific continuation of the discussion in #11, it would seem that the = character is yet another special case. While = is a meaningful character in the query string, separating keys and values, only the first = does that.

Digging in further, empty query parameter keys are OK. And equals signs in the value of query parameter values are OK.

# Werkzeug request object for "GET http://localhost:5000/?=x=x=x="
# from Firefox and Chrome
(Pdb) request.args
ImmutableMultiDict([('', u'x=x=x=')])

Seen here, and in their developer tools, Firefox and Chrome do not encode the equals signs. On the server side, Werkzeug is ok with this.

Now, urllib does, but I think this is only because their implementation is lazy. :)

The example above isn't from nowhere, it's from twisted.python.test.test_url. The problem I see is that Twisted expects urllib's overzealous escaping behavior:

self.assertEqual(u.asText(), 'http://localhost/?=x%3Dx%3Dx')

And I think that should change to simply roundtrip to 'http://localhost/?=x=x=x'. Thoughts, @glyph?

glyph commented

Given that =x=x=x is indeed parseable, let's follow our usual convention here and leave it as-is, as you suggest.

When we address #11, we may want to be a little more zealous, given that the closest reading of https://url.spec.whatwg.org/#concept-urlsearchparams-update that I have the time for suggests (via https://url.spec.whatwg.org/#concept-urlencoded-byte-serializer) that = should be %-encoded when updating query params.

Cool. I've created #9257 in trac for when I actually get around to loosening up those tests in Twisted.

As for the WHATWG, I think they're trying to flex muscle they doesn't have. If Chrome and Firefox don't do it, I'm not sure which Hypertext Applications do, you know? For a variety of reasons I can go into if you'd like, my preference is RFC3986 (and certain other non-obsolete RFCs) followed by browser behavior and then, somewhere down the list, WHATWG.

glyph commented

I am pretty sure I inherited my preference for WHATWG from @Lukasa; I probably don't have a strong opinion myself. t.p.url was definitely more RFC3986-derived originally.

glyph commented

Could we have an issue specifically for documenting spec preferences at a project level somewhere, along with a rationale? This feels like the sort of thing that is insanely subtle and would benefit a lot from being spelled out in a truly ruinous level of detail.

fixed in #39!