The TLD .com.ru is handled incorrectly
Closed this issue · 3 comments
hkopp commented
Hi,
I have encountered a bug when handling .com.ru domains.
In [1]: import tldextract
In [2]: tldextract.extract('http://lamy.com.ru/')
Out[2]: ExtractResult(subdomain='lamy', domain='com', suffix='ru')
I would have expected the following:
Out[2]: ExtractResult(subdomain='', domain='lamy', suffix='com.ru')
.com.ru is in the list of public suffixes, so this is clearly a bug: https://publicsuffix.org/list/public_suffix_list.dat
Thanks for creating the library by the way. I first tried sed with increasingly complex regular expressions, but that quickly grew out of hand. Your library was exactly what I needed.
hkopp commented
And my version number:
In [5]: tldextract.__version__
Out[5]: '3.3.1'
hkopp commented
And troubleshooting of the cache:
(venv) $ pwd
/home/user/.cache/python-tldextract
(venv) $ ls
3.10.5.final__venv__699cb6__tldextract-3.3.1
3.10.6.final__venv__699cb6__tldextract-3.3.1
(venv) $ cat */publicsuffix.org-tlds/*.tldextract.json| jq '.'| grep ".com.ru"
"com.ru",
"com.ru",
john-kurkowski commented
Thanks for the thorough diagnostic info! That suffix is pretty far down the list, so it's in the private domains section. See the FAQ.