mlocati/ip-lib

Incomplete support for Non-decimal notation

cowsay1 opened this issue · 8 comments

The library supports some non-decimal notations, but many formats supported by the ping and browsers do not work here using addressFromString($ip, true, true, true)

left is result of ping, right is "valid" ip notation:

  0.0.0.0	0 (short to 1 octet, actually decimal:)
 0.0.0.10	10
  0.0.0.8	010 (octal)
 0.0.0.10	0xa (hex)
127.0.0.1	127.1 (2 octets)
 10.0.0.8	10.010
127.0.0.1	0177.0x01
127.0.0.1	127.0.1 (3 octets)
127.0.0.1	017700000001 (all-in-one octal)
127.0.0.1	00000000000000000000000000000000000000000000000000000017700000001 (all-in-one octal with infinite leading zeroes)
127.0.0.1	0x7f000001 (all-in-one hex)
127.0.0.1	0x00000000000000000000000000000000000000000000000000000007f000001 (all-in-one hex with infinite leading zeroes)
10.8.0.17	10.010.0x000000000000000011

not sure if i missed some

I'm not sure that a single number is valid in every system...
For example, in termux we have:
Screenshot_20210528-231036_Termux

I did some experiments to better understand how these IPs get parsed by different libc functions. The best documentation on the supported formats I could find was a technical reference on the inet_addr Subroutine via the PHP doc on ip2long(), which includes a BNF and some examples on corner cases.

PHP itself exposes two of libc's functions to deal with IP strings: ip2long() and inet_pton(). Testing these on the examples provided by OP I found that neither of these supports any of these on my system. As commentor Karl Rixon noted:

The manual states that "ip2long() will also work with non-complete IP addresses", however this is system-dependant so cannot be relied upon. [...]
This is because ip2long will use inet_pton if available, which does not support non-complete addresses. If inet_pton is not available on your system, inet_addr will be used and incomplete addresses will work as stated.

With this in mind, I wrote a little tool to be able to use inet_addr directly and extended my test script to include it in the comparison:

libc-inet-parse.c:

#include <stdio.h>
#include <netinet/in.h>
#include <arpa/inet.h>

int main(int argc, char **argv) {
    if (argc != 2) {
        printf("Usage: %s <ip address to parse>\n", argv[0]);
        return 1;
    }
    unsigned long addr = inet_addr(argv[1]);
    if (addr == INADDR_NONE) {
        printf("not supported\n");
    } else {
        struct in_addr paddr;
        paddr.s_addr = addr;
        printf("%s\n", inet_ntoa(paddr));
    }

    return 0;
}

libc-parse.php:

<?php

$ips = [
    '0',
    '10',
    '010',
    '0xa',
    '127',
    '127.1',
    '10.010',
    '0177.0x01',
    '127.0.1',
    '017700000001',
    '00000000000000000000000000000000000000000000000000000017700000001',
    '0x7f000001',
    '0x00000000000000000000000000000000000000000000000000000007f000001',
    '10.010.0x000000000000000011',
];

printf("%80s\t%20s\t%20s\t%20s\n", 'IP', 'inet_pton', 'ip2long', 'inet_addr');
printf("%'-80s\t%'-20s\t%'-20s\t%'-20s\n", '', '', '', '');
foreach ($ips as &$ip) {
    $result1 = inet_pton($ip);
    $result1 = is_string($result1) ? inet_ntop($result1) : 'not supported';

    $result2 = ip2long($ip);
    $result2 = $result2 !== false ? long2ip($result2) : 'not supported';

    $result3 = trim(shell_exec('./libc-inet-parse '.$ip));

    printf("%80s\t%20s\t%20s\t%20s\n", $ip, $result1, $result2, $result3);
}

Providing this output:

$ gcc libc-inet-parse.c -o libc-inet-parse
$ php libc-parse.php
                                                                              IP	           inet_pton	             ip2long	           inet_addr
--------------------------------------------------------------------------------	--------------------	--------------------	--------------------
                                                                       127.0.0.1	           127.0.0.1	           127.0.0.1	           127.0.0.1
                                                                               0	       not supported	       not supported	             0.0.0.0
                                                                              10	       not supported	       not supported	            0.0.0.10
                                                                             010	       not supported	       not supported	             0.0.0.8
                                                                             0xa	       not supported	       not supported	            0.0.0.10
                                                                             127	       not supported	       not supported	           0.0.0.127
                                                                           127.1	       not supported	       not supported	           127.0.0.1
                                                                          10.010	       not supported	       not supported	            10.0.0.8
                                                                       0177.0x01	       not supported	       not supported	           127.0.0.1
                                                                         127.0.1	       not supported	       not supported	           127.0.0.1
                                                                    017700000001	       not supported	       not supported	           127.0.0.1
               00000000000000000000000000000000000000000000000000000017700000001	       not supported	       not supported	           127.0.0.1
                                                                      0x7f000001	       not supported	       not supported	           127.0.0.1
               0x00000000000000000000000000000000000000000000000000000007f000001	       not supported	       not supported	           127.0.0.1
                                                     10.010.0x000000000000000011	       not supported	       not supported	           10.8.0.17

Great analysis!

BTW I think we should also test musl (used for example by termux and alpine), which seems to behave differently vs glibc: in the example above 127.1 is accepted, but 127 isn't...

In musl, inet_addr() is just a wrapper for inet_aton(), probably kept for API compatibility:
http://git.musl-libc.org/cgit/musl/tree/src/network/inet_addr.c

Given the quirks of PHP's ip2long(), considering the musl implementation linked above, and other comments I've seen, that warn against the use of inet_addr(), it seems that newer applications prefer the use of inet_pton() or inet_aton(), both of which also support IPv6, but not necessarily the abbreviated syntax or all it's variants.

Personally, I've mostly seen the abbreviated decimal notation used in service configuration files, for example when specifying a bind address. Writing just 0 (for all interfaces) or 127.1/::1 (only localhost) is a convenient shorthand. The octal and hex notations may be less common, but it could make sense to implement them in a universal IP parsing library, allowing it to cover many different use cases.

See also https://man7.org/linux/man-pages/man3/inet_addr.3.html

It seems that using 1 to 3 numbers in IPv4 is not like assuming zeroes:

Screenshot_20210530-145535_Termux

What would be the safest way to deal with incoming IP addresses in different decimal / octal / hex formats? My plan is to just deny any hosts that match /^0x[a-f0-9]+$/i or /^0[0-9.]+$/ or /^\d+$/ outright and require folks provide an IP address that's otherwise parseable by ip-lib.

Ideally only '127.0.0.1' of these would be allowed:

http://127.0.0.1
http://0x7f000001
http://0177.0000.0000.0001
http://17700000001
http://2130706433

What would be the safest way to deal with incoming IP addresses in different decimal / octal / hex formats? My plan is to just deny any hosts that match /^0x[a-f0-9]+$/i or /^0[0-9.]+$/ or /^\d+$/ outright and require folks provide an IP address that's otherwise parseable by ip-lib.

@KorvinSzanto Sorry, I completely missed your question.
This will be possible when #73 will be ready and merged.

You'll be able to do something like

use IPLib\Factory;
use IPLib\ParseStringFlag;

// Let's parse $input, accepting only the standard IP notation
$ip = Factory::parseAddressString($input);

// Let's parse $input, accepting alternative IP notations
$alternativeIP = Factory::parseAddressString($input, ParseStringFlag::IPV4_MAYBE_NON_DECIMAL | ParseStringFlag::IPV4ADDRESS_MAYBE_NON_QUAD_DOTTED);
if ($alternativeIP !== null && (string) $alternativeIP !== (string) $ip) {
    // You may want to refuse $input
}
// Here $ip is either NULL (if $input is not valid) or an `IPLib\Address\AddressInterface` if it's in "standard" form

@KorvinSzanto I just published a new version which allows you to do what I described above.