[BUG] - Parsing error on URLs ending in ca.com (e.g: geteduca.com)

Question

[BUG] - Parsing error on URLs ending in ca.com (e.g: geteduca.com)

Closed this issue a year ago · 2 comments

Hello,

My scrapper encoutered a bug with what seemed like a normal URL

Actual behavior

URL: https://www.geteduca.comfailed to be parsed by tldextract.extract(url) with result being : ExtractResult(subdomain='', domain='', suffix='').

Expected behavior

I would expect to receive ExtractResult(subdomain='www', domain='geteduca', suffix='com')

Thank you for looking into it, I'll try to submit a fix if I have the time 😄

Answer 1 · 2023-06-29T12:22:47.000Z

I'm getting the correct results on CPython 3.10.11 and PyPy 3.9.16.

import tldextract; tldextract.extract("https://www.geteduca.com")

Can you let us know your Python version and verify if you are using tldextract >=3.4.4?

Answer 2 · 2023-07-04T15:32:50.000Z

Hey sorry, I've updated my packages and Python but cannot reproduce it anymore. I don't know what happened :/