python-hyper/hyperlink

Decode percent-encoding in mixed text

mahmoud opened this issue · 3 comments

Right now, _url._percent_decode() has a fast and silent path with some surprising results. You can pass in text with percent encoding present, and get that text back out, unmodified, if there are any non-ASCII characters present.

>>> _percent_decode(u'é%3Dmc^2')
u'é%3Dmc^2'

This poses an obvious problem to decoding IRI values containing reserved characters, as is the case for DecodedURL. #54 worked around this by re-percent-encoding everything before percent decoding it. Aside from being a bit hacky, there are more efficient ways of approaching this.

glyph commented

My understanding is that this is just about refactoring the behavior of DecodedURL internally, not a public-facing behavior change or new API?

That's correct. And there's a PR for it, seeking your approval: #59 :)

Fixed by #59, merged yesterday, released today. 🎉