john-kurkowski/tldextract

platform-dependent parsing of IP addresses

jmdacruz opened this issue · 1 comments

Try the following in MacOS (Note the leading 0 on the last segment of the IP address):

>>> import tldextract
>>> tldextract.extract('127.0.0.01')
ExtractResult(subdomain='', domain='127.0.0.01', suffix='', is_private=False)
>>>
>>> from urllib.parse import urlparse
>>> urlparse('127.0.0.01')
ParseResult(scheme='', netloc='', path='127.0.0.01', params='', query='', fragment='')

And then in Linux:

>>> import tldextract
>>> tldextract.extract('127.0.0.01') # note difference in output here
ExtractResult(subdomain='127.0.0', domain='01', suffix='', is_private=False)
>>> 
>>> from urllib.parse import urlparse
>>> urlparse('127.0.0.01') # urllib remains consistent in Linux and Mac
ParseResult(scheme='', netloc='', path='127.0.0.01', params='', query='', fragment='')

Several languages recently started to correctly validating these IP addresses (there are security concerns associated to this, e.g., GHSA-38h6-vxp4-qxvm).

Environment:

  • Python 3.11.4
  • tldextract version 5.1.1
  • Linux Debian 5.10.197-1
  • MacOS Sonoma

Thanks for the bug report, I can confirm that it affects older macOS versions like Big Sur.

socket.inet_pton on macOS also erroneously accepts leading IPv4 zeroes in IPv6 (dual) addresses , like 2001:db8:3333:4444:5555:6666:1.2.3.04.