john-kurkowski/tldextract

Suffix detection broken for private `uk.com` suffix in version 3.4.3

Closed this issue · 2 comments

I think the trie suffix detection in #285 in version 3.4.3 might have broken looking up uk.com private suffix (which is included in the bundled snapshot)

Comparing 3.4.2:

>>> import tldextract
>>> tldextract.__version__
'3.4.2'
>>> extractor = tldextract.TLDExtract(include_psl_private_domains=True)
>>> extractor("foo.uk.com")
ExtractResult(subdomain='', domain='foo', suffix='uk.com')

to 3.4.3:

>>> import tldextract
>>> tldextract.__version__
'3.4.3'
>>> extractor = tldextract.TLDExtract(include_psl_private_domains=True)
>>> extractor("foo.uk.com")
ExtractResult(subdomain='foo', domain='uk', suffix='com')

you can see that the uk.com suffix is no longer recognized but instead thinks uk is the domain.

Although weirdly just using the tldextract.extract wrapper function in both versions give the exact same (correct) results

>>> import tldextract
>>> tldextract.extract("foo.uk.com", include_psl_private_domains=True)
ExtractResult(subdomain='', domain='foo', suffix='uk.com')

I'm looking into this. /cc @elliotwutingfeng

Fixed in 3.4.4. Thanks for the detailed report! That really eased tracking down the bug.