lupomontero/psl

Domain contains subdomain.

z639 opened this issue · 2 comments

z639 commented

Hi,

Unless I'm missing something, are these results erroneous ?

Version: 1.1.23
https://i.imgur.com/950xpqU.png
tested in Chrome 63 and Firefox 58 (Debian)
https://codepen.io/anon/pen/ddRrqo?editors=1010

Hi @z639,

Thanks for the feedback 😉

I think this is issue is simply a misunderstanding of what this library (psl) does, and the naming of the parsed properties (tld, ...). The library parses domains based on the public suffix list, and the parsed tld doesn't necessarily match the actual top level domain (from ICANN's standpoint). In this case, the parsed tld represents the public suffix.

A "public suffix" is one under which Internet users can (or historically could) directly register names. Some examples of public suffixes are .com, .co.uk and pvt.k12.ma.us. The Public Suffix List is a list of all known public suffixes.

The behaviour you are seeing is actually the expected behaviour:
https://github.com/wrangr/psl/blob/master/test/psl.parse.spec.js#L111

I believe the underlying issue is mainly the confusing property names in the parsed object. By this I mean that a better name for parsed.tld could have been something like parsed.publicSuffix, and maybe even keeping the parsed.tld, but with the actual top level domain.

I'm afraid that this design issue was inherited from publicsuffix-ruby and hasn't been a major issue as long as you understand the parser's intention. However, I can see that they have since added support for what they call private domains, which gives you the option to switch off support for private (non-ICANN), and as such would behave as you expected. Would something like this do the trick? It might be worth exploring...

Thoughts?

z639 commented

Thanks for the info and the fast reply.

Yes, I think that addition would do what I'm looking for.

I found your github through this stackoverflow page https://stackoverflow.com/questions/9752963/get-domain-name-without-subdomains-using-javascript and I'm basically looking for a way to check that a domain is valid and then remove any subdomains/prefixes to see if the main domain itself is in an array of white listed domains.

I'll try and figure out how they're doing that at the ruby repository.

Thanks again.