Feature: Get URL (and parts of it) in ASCII
dentarg opened this issue · 11 comments
Related to https://github.com/twingly/klondike/issues/31
Would be useful to expose #normalized_host
from Addressable, because Ruby DNS libraries (stdlib, alexdalitz/dnsruby#94) can't handle IDN very well (not at all I mean).
[18] pry(main)> Dnsruby::DNS.new.getaddress(Addressable::URI.heuristic_parse("räksmörgås.josefßon.org").normalized_host)
=> #<Dnsruby::IPv4 155.4.17.102>
[19] pry(main)> Resolv.getaddress(Addressable::URI.heuristic_parse("räksmörgås.josefßon.org").normalized_host)
=> "155.4.17.102"
And because our own #normalized_host
isn't at all suitable to use (we can break URLs). (The terminology here is unfortunate...)
Don't we really want a "to punycode" method somewhere?
Don't we really want a "to punycode" method somewhere?
That would be better. I'm guessing it would be nice to get both host
and all strings that contain host
in both ascii
and utf8
.
I'm not really following...
Is it something like this what we mean?
Twingly::URL.parse("http://räksmörgås.josefßon.org/foobar").to_punycode
# => "http://xn--rksmrgs-5wao1o.josefsson.org/foobar"
How should we implement it? Should we use http://www.rubydoc.info/gems/addressable/Addressable/URI#normalized_host-instance_method or not? In my mind that's the most straight forward and lowest cost thing to do
Please elaborate your thoughts! :)
I haven't read everything at https://en.wikipedia.org/wiki/Punycode, but in my mind we only care about Punycode in the context of DNS, the host that is.
Maybe we want a method called punycoded_host
?
Is it something like this what we mean?
Yes.
I haven't read everything at https://en.wikipedia.org/wiki/Punycode, but in my mind we only care about Punycode in the context of DNS, the host that is.
DNS is a part of HTTP.
How should we implement it? Should we use http://www.rubydoc.info/gems/addressable/Addressable/URI#normalized_host-instance_method or not? In my mind that's the most straight forward and lowest cost thing to do
Not until we've looked at alternatives.
If you need this feature now just use Adressable
explicitly in your code.
The title of this issues is now less opinionated.
The punycoded TLD would also be nice to have when dealing with Internationalized ccTLDs.
I'm merging in #72 here, it's the same thing
In one project we have this:
connection = Faraday.new do |faraday| faraday.use FaradayMiddleware::FollowRedirects faraday.adapter :excon end escaped_url = Twingly::URL.parse(url).normalized.to_s connection.head(escaped_url)Not sure we should do escaping exactly like this, but it should be a part of twingly-url IMHO.
Not sure we should do escaping exactly like this
Yeah, normalizing != escaping
#71 could be expanded to cover the whole URL, and then that could be used instead of
#normalized
in code such as the above.
Dumping related/interesting links: https://bugs.ruby-lang.org/issues/12852, https://url.spec.whatwg.org/
Heh, I see that Pinboard says "previously saved october 2015" about the above URL and the page now says "Last Updated 25 October 2018". It sure takes some time to compile a solid standard.