john-kurkowski/tldextract

FDQN Extraction error in some domains

Closed this issue · 2 comments

Error in extraction for

  • subdomain.domain.com.de
  • domain.com.de
  • subdomain.domain.com.se
  • domain.com.se

com.de and com.se are valid tld from the Public Domains List, but can't be recognized by the extractor;

Example:

import tldextract
tldextract.extract('http://forums.test123.com.de/')
ExtractResult(subdomain='forums.test123', domain='com', suffix='de')

A website that use this, ie:
https://herbalife(dot)com(dot)se/
(hidden to avoid backlinks)

no issue, intended behaviour:

extractor = tldextract.TLDExtract(include_psl_private_domains=True)