stripWWW strips domains too
Closed this issue · 10 comments
normalize("www.com")
//-> http://com
This can be rectified with strip-www.
Can you do a PR?
Hello! If no one is working on it right now, I would like to try. It's my first attempt to contribute to another open source project :)
Feel free. If not, I'll try to get to it sometime next week.
Can you give me some more information about the bug?
It only don't work with URL www.com
?
Because I have done some tests with
http://testwww.com
http://test.www.com
and they passed the tests.
The most important test case is listed above. Also, check out strip-www's test suite.
I'm confused with the cases
t.is(m('http://www.', opts), 'http://www.');
t.is(m('http://www', opts), 'http://www');
from strip-www's test suite. Is those even valid domain names?
They don't have to be valid domains.. they test that invalid input isn't further invalidated. Also, "www" could be a local TLD.
There's an edge case with stripping www subdomains with regexp:
www.subdomain.domain.tld
www.www.domain.com
will produce:
subdomain.domain.tld
www.domain.com
Because the regexp doesn't know how far the subdomain extends because it can't know how long the TLD extends (.com vs .co.uk) without a large library like parse-domain. Such a lib is fine for the server, but not for a browser build.
Interesting - so the edge case of URLs like www.www.domain.com
would actually cause normalize-url to no longer be idempotent (calling it once would produce www.domain.com
, and then calling it a second time on the result would produce domain.com
). As a solution, perhaps just let the stripWWW option strip all leading "www" subdomains? It seems like using "www" as a subdomain in two levels of a URL is asking for trouble anyway.
It gets worse:
www.app.company.com
produces:
app.company.com
Technically, the subdomain is "www.app", since there is structurally no sub-subdomain. As a result, there could theroetically be no "app.company.com" configured.