reactphp/socket

Improve TLS 1.3 support

clue opened this issue ยท 7 comments

clue commented

TLS 1.3 is now an official standard as of August 2018 (https://tools.ietf.org/html/rfc8446) which is great news! ๐ŸŽ‰ See https://wiki.openssl.org/index.php/TLS1.3 if you want to learn more about why this is great news.

OpenSSL 1.1.1 supports TLS 1.3 (https://www.openssl.org/blog/blog/2017/05/04/tlsv1.3/). For example, this version ships with Ubuntu 18.10 (and newer) by default, meaning that recent installations support TLS 1.3 out of the box :shipit:

At the time of writing this, PHP does not know about TLS 1.3 at all, there is however a pending PR that adds TLS 1.3 support for a future PHP version (php/php-src#3700).

Interestingly, due to the way how PHP interfaces with OpenSSL, this means that TLS 1.3 is in fact enabled by default for all client and server connections when using a recent OpenSSL version. OpenSSL assumes all protocols are supported by default and PHP simply doesn't ever tell OpenSSL to change anything about TLS 1.3. In the above PR this would be addressed via https://github.com/php/php-src/pull/3700/files#diff-fba6f2ad888bf4d71a91b060dfee4522R1005, meaning that a future PHP version will provide a way to explicitly disable TLS 1.3 (which might make sense for compatibility reasons). See also https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_options.html for more details about its API. Whether TLS 1.3 should be supported by default appears to be in discussion, but I consider this to be out of scope for this ticket.

Unfortunately, this currently results in some half-baked TLS 1.3 support using latest PHP versions with a recent OpenSSL version. For example, this means that running our current test suite on Ubuntu 18.10 will hang with 100% CPU usage. This can easily be reproduced in a default installation, see also https://gist.github.com/clue/20de5b345dc10c204c0da8f357d6a84d if you want to reproduce this with or without Docker.

I've been able to trace this down to what I consider a bug in PHP's stream_get_contents() function. If you call stream_get_contents() on a TLS 1.3 connection with a maximum length given, it will hang with 100% CPU usage until the remote end actually sends some application payload data. This happens only when the remote end doesn't send any application data and despite the socket being set to non-blocking mode. Apparently, this happens because the TLS handshake was changed in such a way that on the client side PHP sees some "empty" payload messages immediately after the initial handshake due to the underlying socket reporting data for this very handshake and stream_get_contents() with a maximum length given appears to want to wait to actually return some payload data (strace reveals repeated read() operations here).

In other words, this can be reproduced by connecting to any server that supports TLS 1.3 that does not immediately send data (which would apply to HTTP). For example, the following code will open a TLS 1.3 connection and then hang with out 100% CPU usage inside stream_get_contents():

<?php

//$address = 'tls://127.0.0.1:8000';
$address = 'tls://gmail.com:443';

//stream_context_set_default(array('ssl' => array('verify_peer' => false)));
$socket = stream_socket_client($address);
stream_set_blocking($socket, false);

var_dump(stream_get_meta_data($socket));

while (true) {
    $n = null;
    $r = array($socket);

    if (!stream_select($r, $n, $n, null)) {
        break;
    }

    // this hangs only when a maximum length is given:
    $data = stream_get_contents($socket, 1000);

    echo '[' . strlen($data) . ': ' . $data . ']';

    if (feof($socket)) {
        echo 'END';
        break;
    }
}

Make sure to check its output, on affected OpenSSL versions this output would start like this and then hang with 100% CPU usage:

array(8) {
  'crypto' =>
  array(4) {
    'protocol' =>
    string(7) "UNKNOWN"
    'cipher_name' =>
    string(22) "TLS_AES_256_GCM_SHA384"
    'cipher_bits' =>
    int(256)
    'cipher_version' =>
    string(7) "TLSv1.3"
  }
  โ€ฆ

I consider this to be a bug in PHP's stream_get_contents() function and we should look into fixing this upstream in PHP. While I can see the reasoning for why it behaves like this at the moment, arguably it shouldn't cause this CPU load (potential DOS attack).

As a work around, I've been able to avoid this problem by not passing a maximum length to this function. This will make this function return an empty string immediately instead of causing high CPU load. Unfortunately, ReactPHP's streams currently consider this to be an EOF situation and hence close this connection resource, but this will be addressed via reactphp/stream#139. With these patches applied locally, I've been able to successfully use TLS 1.3 on my machine. Therefor I will look into pushing this forward so we can provide limited TLS 1.3 support in ReactPHP for now and eventually full TLS 1.3 support once PHP catches up.

As an intermediary work around, I've been able to manually set PHP's "crypto_method" and/or manually specify ciphers so that PHP no longer negotiates TLS 1.3 with the remote end. I do not consider this to be a viable solution, but it at least allows TLS 1.2 (or lower) connections on affected versions without causing this CPU load.

I will continue working on this and will update this ticket as we progress. Any input is appreciated! ๐Ÿ‘

clue commented

I consider this to be a bug in PHP's stream_get_contents() function and we should look into fixing this upstream in PHP. While I can see the reasoning for why it behaves like this at the moment, arguably it shouldn't cause this CPU load (potential DOS attack).

Thanks to @kelunik for reporting this upstream (https://bugs.php.net/bug.php?id=77390) and filing an upstream PR to address this (php/php-src#3729)! ๐Ÿ‘ This means there's a fair chance this will be fixed in a PHP version not too far in the future. In the meantime, we will still continue working on providing a work-around for affected versions as described above.

@clue That fix is for feof hanging. Does stream_get_contents use that internally, too? I haven't verified that, yet.

clue commented

As a work around, I've been able to avoid this problem by not passing a maximum length to this function. This will make this function return an empty string immediately instead of causing high CPU load. Unfortunately, ReactPHP's streams currently consider this to be an EOF situation and hence close this connection resource, but this will be addressed via reactphp/stream#139. With these patches applied locally, I've been able to successfully use TLS 1.3 on my machine. Therefor I will look into pushing this forward so we can provide limited TLS 1.3 support in ReactPHP for now and eventually full TLS 1.3 support once PHP catches up.

This workaround has now been implemented via #186 and there are no known issues with it ๐Ÿ‘

@kelunik: @clue That fix is for feof hanging. Does stream_get_contents use that internally, too? I haven't verified that, yet.

@kelunik I may have misunderstood our IRC conversation, so I haven't verified this either. That being said, the above workaround does not require this fix, so I don't have an immediate use case anymore.

@clue While #186 is a workaround, reactphp/stream#139 introduced feof on writing, which might now block. The reason for stream_get_contents blocking is feof blocking, I have now verified this.

I guess the workaround in #186 should only be done for TLS streams, as it might consume quite some memory on unix pipes with a lot of data available?

clue commented

@clue While #186 is a workaround, reactphp/stream#139 introduced feof on writing, which might now block. The reason for stream_get_contents blocking is feof blocking, I have now verified this.

@kelunik Thanks for verifying and confirming what we've only suspected so far ๐Ÿ‘ The way I understand this means that both variants use feof() and this workaround is confirmed to at least allow us to communicate with well behaving TLS 1.3 peers. The discussion linked in PHP bug php/php-src#3223 suggests that either variant may still be broken if any TLS peer (intentionally) uses fragmented TLS records and thus may still cause feof() to block under certain circumstances (possible DoS vector). Unless I'm missing something here, I think that the suggested workaround still makes sense for now and we should look into getting this fixed upstream in PHP ๐Ÿ‘

I guess the workaround in #186 should only be done for TLS streams, as it might consume quite some memory on unix pipes with a lot of data available?

I've verified that Unix domain sockets (UDS) use a default of 128 KiB on recent Ubuntu versions, unless explicitly configured via send and receive buffers. I've verified this is limited by a kernel setting which happens to be 208 KiB on recent Ubuntu versions. It's my understanding that this is "good enough" for now and users explicitly overwriting these low-level settings likely want to receive larger chunks (https://stackoverflow.com/questions/21856517/whats-the-practical-limit-on-the-size-of-single-packet-transmitted-over-domain).

I'd rather fix the 80% use case and get this release out. We can always address any follow-up questions once we actually notice any unexpected issues ๐Ÿ‘

I am currently working on what I believe is an instance of feof blocking:
phpredis/phpredis#1726

In this case, PhpRedis (a php redis client) is blocking on all TLS 1.3 connections while not blocking on any TLS 1.2 connections.

clue commented

I am currently working on what I believe is an instance of feof blocking: phpredis/phpredis#1726

In this case, PhpRedis (a php redis client) is blocking on all TLS 1.3 connections while not blocking on any TLS 1.2 connections.

@inieves Interesting find and good analysis in your linked ticket!

For ReactPHP, we're no longer aware of any issues related to TLS 1.3 across a wide PHP version range including legacy PHP up to the latest PHP 8.1 release. I'm not intimately familiar with the phpredis code base, so unfortunately I don't see much that springs to mind that could cause this behavior you're seeing. In either case, here are the relevant tickets how we resolved this for ReactPHP:

I hope this helps ๐Ÿ‘