remove "family" attribute
glyph opened this issue · 3 comments
I agree this one isn't ideal, but I'm not confident my other idea for an API is more ideal, yet.
First, family exists to make the URL more useful for socket interoperation. Several socket APIs take a family argument, and this exactly maps to that. Furthermore, if family is socket.AF_INET6, then to_text will wrap the hostname in square brackets, per the standard. But square brackets are a no-go for socket APIs.
In short, with the (host, port, family) split arrangement, URL is out-of-the-box usable with socket.
As for alternatives, one option I considered before hyperlink and even now, is to allow square brackets in the value for host, and use those as the IPv6 flag. There are a couple downsides to this:
- Users have to manually wrap/unwrap square brackets (or use a separate/new API)
- Doesn't give any indication as to whether the address parsed as IPv4
But at least it would be less likely to have family and host get out of sync. Right now, to .replace() a URL from an IPv4 to an IPv6 host will also require passing family=socket.AF_INET6 (because of the colons). As I type this, I think that may be why I made parse_host part of the public API (#25). I think I preferred that slightly to manual IPv6 guessing and square bracket wrapping.
What do you think?
My go-to guide to socket interoperation for non-experts is generally:
- don't use the socket module directly, it's harder than you think
- there is no step 2
For public-facing API, I think this is an ill-considered way to try to be "helpful". For example, socket.socket((url.family, socket.SOCK_STREAM)) will just raise an exception unless the URL is specified with an IP address literal, so it's not generally usable. In the case where a hostname is specified, defaulting to either IPv4 or IPv6 is just ''wrong''; you have to do both, and it's more complicated than just issuing a single .connect call. So this isn't even just me being "use twisted for your sockets"; I legitimately can't imagine what kind of application this would be helpful in.
Internally in the data representation, this causes bugs. If someone were to read these docs and "helpfully" pass in socket.AF_INET6 because they wanted to be sure to be IPv6 ready, you'd get this hot pile of nonsense:
>>> hyperlink.URL(family=socket.AF_INET6, host=u'www.example.com')
URL.from_text(u'http://[www.example.com]')So my take on this would be to remove it entirely, make the square-bracket distinction based on the presence of colons in the host and not as a separate flag, and validate that any host with colons in it is a valid IPv6 address on construction.
Fixed in 17.3.0, just released last night, thanks again!