john-kurkowski/tldextract

Extract assigns wrong top level domain (provate suffix value) to suffix parameter

nerses0 opened this issue · 2 comments

Hi,
It seems that the extract method fails to assign proper public suffix value for URLs/domains that use public suffix from the list below
// CentralNic : http://www.centralnic.com/names/domains
// Submitted by registry gavin.brown@centralnic.com
ae.org
br.com
cn.com
com.de
com.se
de.com
eu.com
gb.net
hu.net
jp.net
jpn.com
mex.com
ru.com
sa.com
se.net
uk.com
uk.net
us.com
za.bz
za.com

Here are few examples

extract('http://anysubdomain.anydomain.za.com')
ExtractResult(subdomain='anysubdomain.anydomain', domain='za', suffix='com', is_private=False)
extract('http://anysubdomain.anydomain.gb.net')
ExtractResult(subdomain='anysubdomain.anydomain', domain='gb', suffix='net', is_private=False)
extract('http://anysubdomain.anydomain.us.com')
ExtractResult(subdomain='anysubdomain.anydomain', domain='us', suffix='com', is_private=False)
extract('http://anysubdomain.anydomain.ru.com')
ExtractResult(subdomain='anysubdomain.anydomain', domain='ru', suffix='com', is_private=False)

expected output say for the last one is ru.com

You need to set include_psl_private_domains=True.

extract('http://anysubdomain.anydomain.za.com', include_psl_private_domains=True)
ExtractResult(subdomain='anysubdomain', domain='anydomain', suffix='za.com', is_private=True)

See https://github.com/john-kurkowski/tldextract/blob/master/README.md#public-vs-private-domains

Thanks! it works!