Incomplete support for Non-decimal notation
cowsay1 opened this issue · 8 comments
The library supports some non-decimal notations, but many formats supported by the ping and browsers do not work here using addressFromString($ip, true, true, true)
left is result of ping, right is "valid" ip notation:
0.0.0.0 0 (short to 1 octet, actually decimal:)
0.0.0.10 10
0.0.0.8 010 (octal)
0.0.0.10 0xa (hex)
127.0.0.1 127.1 (2 octets)
10.0.0.8 10.010
127.0.0.1 0177.0x01
127.0.0.1 127.0.1 (3 octets)
127.0.0.1 017700000001 (all-in-one octal)
127.0.0.1 00000000000000000000000000000000000000000000000000000017700000001 (all-in-one octal with infinite leading zeroes)
127.0.0.1 0x7f000001 (all-in-one hex)
127.0.0.1 0x00000000000000000000000000000000000000000000000000000007f000001 (all-in-one hex with infinite leading zeroes)
10.8.0.17 10.010.0x000000000000000011
not sure if i missed some
I did some experiments to better understand how these IPs get parsed by different libc functions. The best documentation on the supported formats I could find was a technical reference on the inet_addr Subroutine via the PHP doc on ip2long()
, which includes a BNF and some examples on corner cases.
PHP itself exposes two of libc's functions to deal with IP strings: ip2long()
and inet_pton()
. Testing these on the examples provided by OP I found that neither of these supports any of these on my system. As commentor Karl Rixon noted:
The manual states that "ip2long() will also work with non-complete IP addresses", however this is system-dependant so cannot be relied upon. [...]
This is because ip2long will use inet_pton if available, which does not support non-complete addresses. If inet_pton is not available on your system, inet_addr will be used and incomplete addresses will work as stated.
With this in mind, I wrote a little tool to be able to use inet_addr directly and extended my test script to include it in the comparison:
libc-inet-parse.c:
#include <stdio.h>
#include <netinet/in.h>
#include <arpa/inet.h>
int main(int argc, char **argv) {
if (argc != 2) {
printf("Usage: %s <ip address to parse>\n", argv[0]);
return 1;
}
unsigned long addr = inet_addr(argv[1]);
if (addr == INADDR_NONE) {
printf("not supported\n");
} else {
struct in_addr paddr;
paddr.s_addr = addr;
printf("%s\n", inet_ntoa(paddr));
}
return 0;
}
libc-parse.php:
<?php
$ips = [
'0',
'10',
'010',
'0xa',
'127',
'127.1',
'10.010',
'0177.0x01',
'127.0.1',
'017700000001',
'00000000000000000000000000000000000000000000000000000017700000001',
'0x7f000001',
'0x00000000000000000000000000000000000000000000000000000007f000001',
'10.010.0x000000000000000011',
];
printf("%80s\t%20s\t%20s\t%20s\n", 'IP', 'inet_pton', 'ip2long', 'inet_addr');
printf("%'-80s\t%'-20s\t%'-20s\t%'-20s\n", '', '', '', '');
foreach ($ips as &$ip) {
$result1 = inet_pton($ip);
$result1 = is_string($result1) ? inet_ntop($result1) : 'not supported';
$result2 = ip2long($ip);
$result2 = $result2 !== false ? long2ip($result2) : 'not supported';
$result3 = trim(shell_exec('./libc-inet-parse '.$ip));
printf("%80s\t%20s\t%20s\t%20s\n", $ip, $result1, $result2, $result3);
}
Providing this output:
$ gcc libc-inet-parse.c -o libc-inet-parse
$ php libc-parse.php
IP inet_pton ip2long inet_addr
-------------------------------------------------------------------------------- -------------------- -------------------- --------------------
127.0.0.1 127.0.0.1 127.0.0.1 127.0.0.1
0 not supported not supported 0.0.0.0
10 not supported not supported 0.0.0.10
010 not supported not supported 0.0.0.8
0xa not supported not supported 0.0.0.10
127 not supported not supported 0.0.0.127
127.1 not supported not supported 127.0.0.1
10.010 not supported not supported 10.0.0.8
0177.0x01 not supported not supported 127.0.0.1
127.0.1 not supported not supported 127.0.0.1
017700000001 not supported not supported 127.0.0.1
00000000000000000000000000000000000000000000000000000017700000001 not supported not supported 127.0.0.1
0x7f000001 not supported not supported 127.0.0.1
0x00000000000000000000000000000000000000000000000000000007f000001 not supported not supported 127.0.0.1
10.010.0x000000000000000011 not supported not supported 10.8.0.17
Great analysis!
BTW I think we should also test musl (used for example by termux and alpine), which seems to behave differently vs glibc: in the example above 127.1
is accepted, but 127
isn't...
In musl, inet_addr()
is just a wrapper for inet_aton()
, probably kept for API compatibility:
http://git.musl-libc.org/cgit/musl/tree/src/network/inet_addr.c
Given the quirks of PHP's ip2long()
, considering the musl implementation linked above, and other comments I've seen, that warn against the use of inet_addr()
, it seems that newer applications prefer the use of inet_pton()
or inet_aton()
, both of which also support IPv6, but not necessarily the abbreviated syntax or all it's variants.
Personally, I've mostly seen the abbreviated decimal notation used in service configuration files, for example when specifying a bind address. Writing just 0
(for all interfaces) or 127.1
/::1
(only localhost) is a convenient shorthand. The octal and hex notations may be less common, but it could make sense to implement them in a universal IP parsing library, allowing it to cover many different use cases.
See also https://man7.org/linux/man-pages/man3/inet_addr.3.html
It seems that using 1 to 3 numbers in IPv4 is not like assuming zeroes:
What would be the safest way to deal with incoming IP addresses in different decimal / octal / hex formats? My plan is to just deny any hosts that match /^0x[a-f0-9]+$/i
or /^0[0-9.]+$/
or /^\d+$/
outright and require folks provide an IP address that's otherwise parseable by ip-lib.
Ideally only '127.0.0.1' of these would be allowed:
http://127.0.0.1
http://0x7f000001
http://0177.0000.0000.0001
http://17700000001
http://2130706433
What would be the safest way to deal with incoming IP addresses in different decimal / octal / hex formats? My plan is to just deny any hosts that match /^0x[a-f0-9]+$/i or /^0[0-9.]+$/ or /^\d+$/ outright and require folks provide an IP address that's otherwise parseable by ip-lib.
@KorvinSzanto Sorry, I completely missed your question.
This will be possible when #73 will be ready and merged.
You'll be able to do something like
use IPLib\Factory;
use IPLib\ParseStringFlag;
// Let's parse $input, accepting only the standard IP notation
$ip = Factory::parseAddressString($input);
// Let's parse $input, accepting alternative IP notations
$alternativeIP = Factory::parseAddressString($input, ParseStringFlag::IPV4_MAYBE_NON_DECIMAL | ParseStringFlag::IPV4ADDRESS_MAYBE_NON_QUAD_DOTTED);
if ($alternativeIP !== null && (string) $alternativeIP !== (string) $ip) {
// You may want to refuse $input
}
// Here $ip is either NULL (if $input is not valid) or an `IPLib\Address\AddressInterface` if it's in "standard" form
@KorvinSzanto I just published a new version which allows you to do what I described above.