Suffix detection broken for private `uk.com` suffix in version 3.4.3
Closed this issue · 2 comments
kevinmarsh commented
I think the trie suffix detection in #285 in version 3.4.3 might have broken looking up uk.com
private suffix (which is included in the bundled snapshot)
tldextract/tldextract/.tld_set_snapshot
Line 10570 in 6f45fed
Comparing 3.4.2:
>>> import tldextract
>>> tldextract.__version__
'3.4.2'
>>> extractor = tldextract.TLDExtract(include_psl_private_domains=True)
>>> extractor("foo.uk.com")
ExtractResult(subdomain='', domain='foo', suffix='uk.com')
to 3.4.3:
>>> import tldextract
>>> tldextract.__version__
'3.4.3'
>>> extractor = tldextract.TLDExtract(include_psl_private_domains=True)
>>> extractor("foo.uk.com")
ExtractResult(subdomain='foo', domain='uk', suffix='com')
you can see that the uk.com
suffix is no longer recognized but instead thinks uk
is the domain.
Although weirdly just using the tldextract.extract
wrapper function in both versions give the exact same (correct) results
>>> import tldextract
>>> tldextract.extract("foo.uk.com", include_psl_private_domains=True)
ExtractResult(subdomain='', domain='foo', suffix='uk.com')
john-kurkowski commented
I'm looking into this. /cc @elliotwutingfeng
john-kurkowski commented
Fixed in 3.4.4. Thanks for the detailed report! That really eased tracking down the bug.