bad parsing
Closed this issue · 3 comments
erpatrik commented
Hi,
found 2 issues while parsing domains.
tldextract.extract("hokkaido.jp")
ExtractResult(subdomain='', domain='', suffix='hokkaido.jp')
tldextract.extract("ketrzyn.pl")
ExtractResult(subdomain='', domain='', suffix='ketrzyn.pl')
ShmuelTreiger commented
Having the same issue with ne.jp
Not sure if relevant, but ne.jp
is actually incorrect, it should be www.ne.jp
. Working with a legacy system which strips www
from urls. When run on www.ne.jp
it works, but that causes other bugs for me.
ShmuelTreiger commented
These are all suffixes on the public sources list. That's why it's like this.
brycedrennan commented
@erpatrik, @ShmuelTreiger is correct, the domains you're testing are in the public sufffix list so this library is correctly returning them as such.