john-kurkowski/tldextract

[BUG] - Parsing error on URLs ending in ca.com (e.g: geteduca.com)

Closed this issue · 2 comments

mdolr commented

Hello,

My scrapper encoutered a bug with what seemed like a normal URL

Actual behavior

URL: https://www.geteduca.comfailed to be parsed by tldextract.extract(url) with result being : ExtractResult(subdomain='', domain='', suffix='').

Expected behavior

I would expect to receive ExtractResult(subdomain='www', domain='geteduca', suffix='com')

Thank you for looking into it, I'll try to submit a fix if I have the time 😄

I'm getting the correct results on CPython 3.10.11 and PyPy 3.9.16.

import tldextract; tldextract.extract("https://www.geteduca.com")

Can you let us know your Python version and verify if you are using tldextract >=3.4.4?

mdolr commented

Hey sorry, I've updated my packages and Python but cannot reproduce it anymore. I don't know what happened :/