platform-dependent parsing of IP addresses
jmdacruz opened this issue · 1 comments
jmdacruz commented
Try the following in MacOS (Note the leading 0
on the last segment of the IP address):
>>> import tldextract
>>> tldextract.extract('127.0.0.01')
ExtractResult(subdomain='', domain='127.0.0.01', suffix='', is_private=False)
>>>
>>> from urllib.parse import urlparse
>>> urlparse('127.0.0.01')
ParseResult(scheme='', netloc='', path='127.0.0.01', params='', query='', fragment='')
And then in Linux:
>>> import tldextract
>>> tldextract.extract('127.0.0.01') # note difference in output here
ExtractResult(subdomain='127.0.0', domain='01', suffix='', is_private=False)
>>>
>>> from urllib.parse import urlparse
>>> urlparse('127.0.0.01') # urllib remains consistent in Linux and Mac
ParseResult(scheme='', netloc='', path='127.0.0.01', params='', query='', fragment='')
Several languages recently started to correctly validating these IP addresses (there are security concerns associated to this, e.g., GHSA-38h6-vxp4-qxvm).
Environment:
- Python 3.11.4
- tldextract version
5.1.1
- Linux Debian 5.10.197-1
- MacOS Sonoma
elliotwutingfeng commented
Thanks for the bug report, I can confirm that it affects older macOS versions like Big Sur.
socket.inet_pton on macOS also erroneously accepts leading IPv4 zeroes in IPv6 (dual) addresses , like 2001:db8:3333:4444:5555:6666:1.2.3.04
.