Automatically handle punycode encoding and decoding.
amcgregor opened this issue · 2 comments
When deserializing a URL provided at instantiation time, or on any assignment of a hostname, punycode encoding should be detected and decoded. When serializing (str()
, &c.) the punycode should encoded version of any non-ASCII-safe hostname should be used.
Reference in Marrow Mailer:
Initial implementation in 9c6fce3. Tested by hand locally, automated tests TBD prior to release.
Comprehensive tests added, with a test using Cyrillic encoding borrowed from the url-normalize
package also added. There is a point of debate over if the summary
compound attribute (typically used for short-form presentation as the text of an anchor) should use the encoded, or Unicode forms.
Ref: https://github.com/marrow/uri/blob/develop/test/test_url_normalize.py#L53
I'd vote in favor of using the readable form for this, but the encoded form for other compound forms, due to the intended purpose/use of this particular attribute for presentation to end-users.