twingly/twingly-url

Consider hostnames larger than 1024 characters invalid

walro opened this issue · 1 comments

walro commented

We're seeing Ruby barfing on certain URLs that have very long host portions.

If I get things correctly this is where the limit is set (defaults to 1024 unless the environment says something else):

https://github.com/ruby/ruby/blob/b753929806d0e42cdfde3f1a8dcdbf678f937e44/ext/socket/socket.c#L897-L903

NI_MAXHOST is set to 1025 in Bionic: http://manpages.ubuntu.com/manpages/bionic/man3/getnameinfo.3.html (check notes)

So 1024 ought to be fine for most of our use-cases

This could be checked in the newly added valid_hostname?/valid_label? methods:

def valid_hostname?(hostname)
# No need to check the TLD, the public suffix list does that
labels = hostname.split(DOT)[0...-1].map(&:to_s)
labels.all? { |label| valid_label?(label) }
end
def valid_label?(label)
return false if label.start_with?(HYPHEN)
return false if label.end_with?(HYPHEN)
label.match?(LETTERS_DIGITS_HYPHEN)
end

https://en.wikipedia.org/wiki/Domain_Name_System#Domain_name_syntax,_internationalization says

A label may contain zero to 63 characters. The null label, of length zero, is reserved for the root zone. The full domain name may not exceed the length of 253 characters in its textual representation.[20] In the internal binary representation of the DNS the maximum length requires 255 octets of storage, as it also stores the length of the name.[3]