sindresorhus/normalize-url

stripWWW strips domains too

Closed this issue · 10 comments

normalize("www.com")
//-> http://com

This can be rectified with strip-www.

Can you do a PR?

Hello! If no one is working on it right now, I would like to try. It's my first attempt to contribute to another open source project :)

Feel free. If not, I'll try to get to it sometime next week.

Can you give me some more information about the bug?
It only don't work with URL www.com ?
Because I have done some tests with

  • http://testwww.com
  • http://test.www.com

and they passed the tests.

The most important test case is listed above. Also, check out strip-www's test suite.

I'm confused with the cases
t.is(m('http://www.', opts), 'http://www.');
t.is(m('http://www', opts), 'http://www');
from strip-www's test suite. Is those even valid domain names?

They don't have to be valid domains.. they test that invalid input isn't further invalidated. Also, "www" could be a local TLD.

There's an edge case with stripping www subdomains with regexp:

www.subdomain.domain.tld
www.www.domain.com

will produce:

subdomain.domain.tld
www.domain.com

Because the regexp doesn't know how far the subdomain extends because it can't know how long the TLD extends (.com vs .co.uk) without a large library like parse-domain. Such a lib is fine for the server, but not for a browser build.

Interesting - so the edge case of URLs like www.www.domain.com would actually cause normalize-url to no longer be idempotent (calling it once would produce www.domain.com, and then calling it a second time on the result would produce domain.com). As a solution, perhaps just let the stripWWW option strip all leading "www" subdomains? It seems like using "www" as a subdomain in two levels of a URL is asking for trouble anyway.

It gets worse:

www.app.company.com

produces:

app.company.com

Technically, the subdomain is "www.app", since there is structurally no sub-subdomain. As a result, there could theroetically be no "app.company.com" configured.