reactphp-legacy/socket-client

SSL Buffer Drain: The Redux

Closed this issue ยท 23 comments

Related to #14, #24, #26...

Yesterday, I updated a project to v0.4.4, and was disappointed to find the SSL buffer issue rearing its ugly head, again. So much so, in fact, that I wanted to see if I could pinpoint which specific versions were affected. I've compiled a table here: https://gist.github.com/mcrumm/49d6ba5e5647429b9689

The list is by no means exhaustive, but it was a bit exhausting... ๐Ÿ˜ƒ

To test this issue, I'm connecting to Slack's RealTime Messaging API via Pawl, and then waiting for messages. This has been discussed at length in other tickets, so the short version is that at least php-5.6.9 still suffers from the SSL buffer drain problem, and should likely be using fread. Additionally, there are versions < 5.6.8 that work without wrapSecure, but I don't know how much concern there is for targeting behavior for specific minor point releases.

PHP7 does not appear to suffer from this issue. Hurray.

The most problematic versions (of those tested):

  • 5.5.23
  • 5.6.7
  • 5.6.8 (Supposed source of the bugfix)

The fact that I couldn't get 5.6.8 working at all made me start to question the integrity of my tests, since that was the version that I had thought fixed the issue, so if someone else confirms 5.6.8 as working, I'll assume my build is the problem.

Is anyone else seeing similar results?

Excellent debugging and reporting. This issue makes me want to cry (again) ๐Ÿ˜ข

clue commented

Thanks for digging into this @mcrumm! IMO this underlines we need a dedicated test case for this, so we can reproduce and avoid any regressions for the future. I'll look into filing a new version of #16.

@cboden @clue Glad I could help. It feels like internals has to play a game of KerPlunk every time there's an SSL issue.

Kerplunk!

Thanks for all your hard work on this!

clue commented

Thanks for your effort @mcrumm, this certainly helps! ๐Ÿ‘

I've just updated our test suite in #62 to cover the buffering issues.

Also, I've just (temporarily!) triggered Travis to run the updated test suite against a wide set of PHP versions: See https://travis-ci.org/reactphp/socket-client/builds/115517562

Note that (for obvious reasons) Travis is not designed to run 50+ builds, so we should probably not run this on a regular basis. This also happens to cause Travis to error on quite a few jobs (arbitrary files failed to download), triggering a restart has so far resolved most issues here.

Perhaps you can help reduce this list to the relevant versions? ๐Ÿ‘

I reached out to to the people at https://3v4l.org and received a reply back from Sjon who's offered to give us a hand. 3v4l executes your PHP scripts on over 150 different PHP versions but there is 0 network access. Sjon has offered to run some scripts (such as this one) privately with a network connection for us.

If we can get a standalone PHP script (without React) to showcase the SSL buffer issue we can get a clear indication on which versions have the bug.

FYI @mcrumm I've just released Pawl/0.2

clue commented

Being able to run tests on 3v4l would certainly be nice! However, adding a minimal test case which highlights this issue in the underlying stream API doesn't really prove we've successfully worked around this in our library, so I'm not sure this actually helps mitigating this issue in the future.

As an alternative, we can use (abuse?) Travis to run our test suite against any PHP versions like this: 2d0d85b

We probably don't want to run this in a regular basis, but can easily trigger a build on this occasionally if we can reduce this list to relevant versions.

3v4l will first let us see which versions have the issue and we can follow up with potential fixes to run on 3v4l again to see if we've successfully targeted the issue.

clue commented

Hey @mcrumm, this issue should now be covered by the test suite (#62). Can you reproduce this by running the test suite?

@clue Thanks for the update! I can give that a try later today and report back.

@clue The integration tests do indeed cover the issue. I ran the test suite against the problematic PHP versions from my original trials. Here are the results:

PHP Tests Notes
5.5.23 BAD PHPUnit hangs while running integration tests
5.5.38 (latest) GOOD
5.6.7 BAD PHPUnit hangs while running integration tests
5.6.8 BAD PHPUnit hangs while running integration tests
5.6.9 GOOD
5.6.24 (latest) GOOD
7.0.8 GOOD
7.0.9 GOOD

One test, however, exhibited some strange behavior. TcpConnectorTest::connectionToTcpServerShouldSucceed would intermittently fail on (at least) 5.6.9, 7.0.8, and 7.0.9.

$ TEST_SECURE=6001 TEST_PLAIN=6000 phpunit
PHPUnit 5.2.12 by Sebastian Bergmann and contributors.

................F......                                           23 / 23 (100%)

Time: 1.25 seconds, Memory: 6.00MB

There was 1 failure:

1) React\Tests\SocketClient\TcpConnectorTest::connectionToTcpServerShouldSucceed
Expectation failed for method name is equal to <string:__invoke> when invoked 1 time(s).
Method was expected to be called 1 times, actually called 0 times.

FAILURES!
Tests: 23, Assertions: 30, Failures: 1.

I also captured the stunnel output from one of the hanging test runs (5.6.8):

$ stunnel3 -f -p stunnel.pem -d 6001 -r 6000
2016.08.02 15:31:33 LOG5[ui]: stunnel 5.35 on x86_64-apple-darwin15.5.0 platform
2016.08.02 15:31:33 LOG5[ui]: Compiled/running with OpenSSL 1.0.2h  3 May 2016
2016.08.02 15:31:33 LOG5[ui]: Threading:PTHREAD Sockets:POLL,IPv6 TLS:ENGINE,FIPS,OCSP,PSK,SNI
2016.08.02 15:31:33 LOG5[ui]: Reading configuration from descriptor 3
2016.08.02 15:31:33 LOG5[ui]: UTF-8 byte order mark not detected
2016.08.02 15:31:33 LOG5[ui]: FIPS mode disabled
2016.08.02 15:31:33 LOG4[ui]: Insecure file permissions on stunnel.pem
2016.08.02 15:31:33 LOG5[ui]: Configuration successful
2016.08.02 15:31:38 LOG5[0]: Service [stunnel3] accepted connection from 127.0.0.1:56104
2016.08.02 15:31:38 LOG3[0]: s_connect: connect ::1:6000: Connection refused (61)
2016.08.02 15:31:38 LOG3[0]: s_connect: connect 127.0.0.1:6000: Connection reset by peer (54)
2016.08.02 15:31:38 LOG5[0]: Connection reset: 0 byte(s) sent to SSL, 0 byte(s) sent to socket
2016.08.02 15:31:38 LOG5[1]: Service [stunnel3] accepted connection from 127.0.0.1:56107
2016.08.02 15:31:38 LOG5[1]: s_connect: connected 127.0.0.1:6000
2016.08.02 15:31:38 LOG5[1]: Service [stunnel3] connected remote server from 127.0.0.1:56108
2016.08.02 15:31:38 LOG5[2]: Service [stunnel3] accepted connection from 127.0.0.1:56109
2016.08.02 15:31:38 LOG3[2]: s_connect: connect ::1:6000: Connection refused (61)
2016.08.02 15:31:38 LOG5[2]: s_connect: connected 127.0.0.1:6000
2016.08.02 15:31:38 LOG5[2]: Service [stunnel3] connected remote server from 127.0.0.1:56111
2016.08.02 15:31:38 LOG5[3]: Service [stunnel3] accepted connection from 127.0.0.1:56112
2016.08.02 15:31:38 LOG5[3]: s_connect: connected 127.0.0.1:6000
2016.08.02 15:31:38 LOG5[3]: Service [stunnel3] connected remote server from 127.0.0.1:56113
2016.08.02 15:31:38 LOG5[3]: Connection closed: 0 byte(s) sent to SSL, 200000 byte(s) sent to socket
2016.08.02 15:31:38 LOG5[4]: Service [stunnel3] accepted connection from 127.0.0.1:56114
2016.08.02 15:31:38 LOG3[4]: s_connect: connect ::1:6000: Connection refused (61)
2016.08.02 15:31:38 LOG5[4]: s_connect: connected 127.0.0.1:6000
2016.08.02 15:31:38 LOG5[4]: Service [stunnel3] connected remote server from 127.0.0.1:56116
2016.08.02 15:31:39 LOG5[5]: Service [stunnel3] accepted connection from 127.0.0.1:56117
2016.08.02 15:31:39 LOG5[5]: s_connect: connected 127.0.0.1:6000
2016.08.02 15:31:39 LOG5[5]: Service [stunnel3] connected remote server from 127.0.0.1:56118
2016.08.02 15:31:39 LOG5[6]: Service [stunnel3] accepted connection from 127.0.0.1:56119
2016.08.02 15:31:39 LOG3[6]: s_connect: connect ::1:6000: Connection refused (61)
2016.08.02 15:31:39 LOG5[6]: s_connect: connected 127.0.0.1:6000
2016.08.02 15:31:39 LOG5[6]: Service [stunnel3] connected remote server from 127.0.0.1:56121
2016.08.02 15:31:39 LOG5[6]: Connection closed: 100000 byte(s) sent to SSL, 0 byte(s) sent to socket
2016.08.02 15:31:39 LOG5[7]: Service [stunnel3] accepted connection from 127.0.0.1:56122
2016.08.02 15:31:39 LOG5[7]: s_connect: connected 127.0.0.1:6000
2016.08.02 15:31:39 LOG5[7]: Service [stunnel3] connected remote server from 127.0.0.1:56123
2016.08.02 15:32:33 LOG5[cron]: Updating DH parameters
2016.08.02 15:32:38 LOG3[1]: transfer: s_poll_wait: TIMEOUTclose exceeded: closing
2016.08.02 15:32:38 LOG5[1]: Connection closed: 0 byte(s) sent to SSL, 0 byte(s) sent to socket
2016.08.02 15:32:38 LOG3[2]: transfer: s_poll_wait: TIMEOUTclose exceeded: closing
2016.08.02 15:32:38 LOG5[2]: Connection closed: 0 byte(s) sent to SSL, 5 byte(s) sent to socket
2016.08.02 15:32:39 LOG3[4]: transfer: s_poll_wait: TIMEOUTclose exceeded: closing
2016.08.02 15:32:39 LOG5[4]: Connection closed: 0 byte(s) sent to SSL, 200000 byte(s) sent to socket
2016.08.02 15:32:39 LOG3[5]: transfer: s_poll_wait: TIMEOUTclose exceeded: closing
2016.08.02 15:32:39 LOG5[5]: Connection closed: 5 byte(s) sent to SSL, 0 byte(s) sent to socket
2016.08.02 15:33:08 LOG5[cron]: DH parameters updated
^C2016.08.02 15:33:08 LOG3[ui]: Received SIGINT; terminating

As soon as phpunit hits React\Tests\SocketClient\SecureIntegration, the PHP process pins the CPU. When the stunnel process hits Updating DH parameters, then it pins the CPU, too.

I hope that helps. Let me know if I can provide any more information.

clue commented

I'm really not sure what to do here at the moment to be honest. I looks like the results of some of the tests are flaky, so this something I'll certainly look into. The stunnel setup does work okay-ish, but it's kind of annoying to set this up and reproduce locally. As such, I'm looking into getting reactphp/socket#24 out of the door so this is easier to reproduce.

If you have any idea on how to improve this otherwise or happen to play around with the "wrap secure" flags and get new insights, please let us know ๐Ÿ‘

clue commented

The stunnel setup does work okay-ish, but it's kind of annoying to set this up and reproduce locally. As such, I'm looking into getting reactphp/socket#24 out of the door so this is easier to reproduce.

The simplified test suite is now in via #78 :shipit:

You can now simply run the test suite without any additional test setup and it should cover everything that's been discussed in this ticket so far.

Can you retry your tests against the current master? ๐Ÿ‘

Certainly! I should be able to look into this again sometime this week.

clue commented

@mcrumm any progress on this or anything we can do to help this progress? ๐Ÿ‘

I'd really like to close this ticket and the v0.5 branch, so we can focus on the cool new features for the upcoming v0.6 branch ๐Ÿ˜Ž

@clue Sorry, I've been swamped, but I'll definitely run some additional tests tonight!

@clue Thanks for updating the test runner! This went much faster than last time ๐Ÿ˜„

Unfortunately, SecureIntegration tests continue to peg the CPU (and never finish) on 5.6.7 and 5.6.8. I wasn't able to run the 5.5.x tests again, as this version of phpunit wants php > 5.5.

~/Code/reactphp/socket-client on master$ vendor/bin/phpunit --testdox
PHPUnit 4.8.33 by Sebastian Bergmann and contributors.

React\Tests\SocketClient\DnsConnector
 [x] Pass by resolver if given ip
 [x] Pass through resolver if given host
 [x] Pass through resolver if given host which resolves to ipv 6
 [x] Pass by resolver if given complete uri
 [x] Rejects immediately if uri is invalid
 [x] Skip connection if dns fails
 [x] Cancel during dns cancels dns and does not start tcp connection
 [x] Cancel during tcp connection cancels tcp connection

React\Tests\SocketClient\Integration
 [x] Getting stuff from google should work
 [x] Getting encrypted stuff from google should work
 [x] Self signed rejects if verification is enabled
 [x] Self signed resolves if verification is disabled
 [x] Cancel pending connection

React\Tests\SocketClient\SecureConnector
 [x] Connection will wait for tcp connection
 [x] Connection with complete uri will be passed through expect for scheme
 [x] Connection to invalid scheme will reject
 [x] Cancel during tcp connection cancels tcp connection

React\Tests\SocketClient\SecureIntegration
clue commented

@mcrumm Thanks for the update!

If you can run the test suite with phpunit --debug, we may be able to pin point this to single test methods. You may add simply add $this->markTestSkipped() to individual test methods that block, so we can also check the others ๐Ÿ‘

clue commented

Besides, what do you think, how relevant are these versions anyway? Is it even worth putting effort into this or should we simply document this as a known issue an recommend an upgrade to versions that are known to work?

(afaict only PHP 5.6.7 and 5.6.8 are broken, anything newer and even the older and unsupported PHP versions works just fine?)

That was my thought, as well. It's likely that SSL support is just fundamentally broken in those releases. It's probably better to document the known issues on those versions and move on.

I'm happy to send the debug output, though, so we can get a better idea of where the problem lies, in case it reappears in later versions.

@clue 5.5.23, 5.6.7, and 5.6.8 were the affected versions I was able to identify. Although I wasn't able to test 5.5.23 this time around, I can't see where it would have magically started working since last time ๐Ÿ˜„

@clue I'd be happy to submit a PR to add docs on the PHP versions with broken SSL implementations, and then call that the resolution to this issue. Does that work for you and @cboden?

clue commented

@mcrumm go for it :shipit: