Parsing without PSL still uses PSL for some FQDNs
Closed this issue · 2 comments
jonmartz commented
Hi, we ran into the following issue:
tldextract.extract('kawasaki.jp', include_psl_private_domains=False)
producesExtractResult(subdomain='', domain='kawasaki', suffix='jp')
, which is fine.tldextract.extract('www.kawasaki.jp', include_psl_private_domains=False)
producesExtractResult(subdomain='', domain='', suffix='www.kawasaki.jp')
, which would be expected ifinclude_psl_private_domains=True
because "*.kawasaki.jp" is a valid suffix according to the public suffix list. But why is the result the same when this parameter is set toFalse
, given that without the "www" prefix the resulting suffix is only "jp"?
Thanks!
john-kurkowski commented
without PSL
I think you're reading the include_psl_private_domains
parameter as controlling whether this project uses the PSL to parse domains, or uses something else to parse domains. The parameter actually controls whether PSL private domains are distinguished from public domains. See this section of the README. The rule for kawasaki.jp occurs early in the list, so it is not in the private domain section, and wouldn't be affected by the parameter.
jonmartz commented
Thank you very much for the quick response, which resolves our issue.