erikd-ambiata/test-warp-wai

Data leakage between network streams with wai/warp/http-conduit

Closed this issue · 22 comments

erikd commented

Hi @snoyberg,

I hope its ok to tag you like this. I'm hunting down a particularly weird bug which I currently suspect is in warp, wai or http-conduit. I've raised the issue here and tagged you because I can't figure out how to make further progress tracking it down. Although I currently think its warp, wai, http-conduit or maybe even in TLS it could also be an issue in the GHC runtime system.

The back story is that I first noticed it in one of our internal systems which has a Warp based HTTP server (via Airship) and writes data to AWS (HTTPS) using amazonka which sits on top http-conduit.

When I ran my server and ran a wget or curl request against it, I would occasionally get a packet like the following:

00000000  15 03 01 00 20 c7 40 42  0a 54 86 1f 8e cc 84 87  |.... .@B.T......|
00000010  b0 e3 0a 86 ec ba aa 37  3a 65 2e 57 3b fa 8d 09  |.......7:e.W;...|
00000020  58 f7 1e 36 cd 48 54 54  50 2f 31 2e 31 20 32 30  |X..6.HTTP/1.1 20|
00000030  30 20 4f 4b 0d 0a 54 72  61 6e 73 66 65 72 2d 45  |0 OK..Transfer-E|
00000040  6e 63 6f 64 69 6e 67 3a  20 63 68 75 6e 6b 65 64  |ncoding: chunked|
00000050  0d 0a 44 61 74 65 3a 20  54 75 65 2c 20 33 30 20  |..Date: Tue, 30 |
00000060  41 75 67 20 32 30 31 36  20 30 33 3a 33 39 3a 35  |Aug 2016 03:39:5|
00000070  32 20 47 4d 54 0d 0a 53  65 72 76 65 72 3a 20 57  |2 GMT..Server: W|
00000080  61 72 70 2f 33 2e 30 2e  31 33 2e 31 0d 0a 41 69  |arp/3.0.13.1..Ai|
00000090  72 73 68 69 70 2d 54 72  61 63 65 3a 20 62 31 33  |rship-Trace: b13|

As you can see, the response I should be getting starts at byte offset 37. The data before that is garbage, or so I thought. I spent a significant amount of time tracking it down before I figured out that if I disabled the component that uploads to AWS, the problem disappears.

I then noticed that garbage bytes that are prepended to the response is actually a TLS alert packet:

15             Alert protocol type
03 01          SSL version (TLS 1.0)
00 20          Message length (32 bytes)

Somehow, data from the http-conduit connection to AWS (over HTTPS) is leaking into the HTTP stream. I managed to confirm this using Wireshark. I can actually capture packets that contain one or more of these TLS alert packets (complete, according to the message length field) followed by my expected HTTP response.

I've also spent many hours working on a reproducable test case I can send you. Unfortunately re-producing what I reported above (the TLS alert packet prepended to a legitimate HTTP response) was difficult and took much experimentation and tweaking. The program also shows up what may be another problem (the HTTPS HandShakeFailed issue), but I'm less sure of that.

This repo contains the code that re-produces this problem and the Readme.md describes the reproduction. First of all I'm interested to see if you can reproduce it (others here in the office have reproduced the original problem on another variant of Linux and on Mac). Then I'd be interested in any advice on how to make further progress on tracking this down. Finally, I'm also interested to hear if you think that the HandshakeFailed errors described in the Readme.md are a bug (I actually think they are closely related to this TLS alert prepend issue).

Thanks!

It would be a good idea to include a stack.yaml file so that everyone testing this is sure to be using the same library versions.

PR #2 adds a stack.yaml file so we can be sure we're all looking at the same thing.

erikd commented

I have a theory about this issue. My suspicion is that

  • A TLS connection goes bad is terminated via an exception.
  • The socket gets reused for a HTTP connection.
  • The TLS finalizer gets run on the socket writing an alert packet.
  • The HTTP response gets written on the socket.

Not sure if this scenario is even possible.

Funny you should mention that, I just got pinged with this:

snoyberg/http-client#225 (comment)

Sounds ominously familiar actually.

Yup, this looks like the same bug I was seeing at snoyberg/http-client#225.

erikd commented

Confirmed. If I edit my canal file and set connection == 0.2.5.* and rebuild I cannot reproduce this issue. In addition, the HandshakeFailed messages that I was unsure about have disappeared. I don't get a single one!

Can you also try testing against this fix: snoyberg/http-client#226?

erikd commented

Will do! I assume that'a version of http-conduit and connection == 0.2.6.*?

Yes, that would be the idea. Though the patch in question only touch http-client, not http-conduit.

Here's an example of how to easily test this: yesodweb/shakespeare#194 (comment)

erikd commented

Doesn't seem to fix all of the issues.

Here's what I did (with connection == 0.2.6.*):

cabal sandbox delete
cabal sandbox init
git clone https://github.com/snoyberg/http-client
(cd http-client \
    && wget -O - https://github.com/snoyberg/http-client/pull/226.patch \
    | git am -s)
sed -i s/2.2.0.1/2.2.0.1123/ http-client/http-conduit/http-conduit.cabal
cabal sandbox add-source http-client/http-conduit
cabal install --dependencies-only

With that last line I watched the cabal install and noted that http-conduit-2.2.0.1123 was installed.

I then ran cabal build and ran the test in the Readme.md. I have managed to re-create all the issues I was concerned about including the leakage of TLS alert packets into the HTTP stream (on the 4th run of the test).

But you need to add http-client, not http-conduit.

erikd commented

I'll have to hack my code to use http-client.

No, it's used by http-conduit

On Fri, Sep 2, 2016, 1:37 PM Erik de Castro Lopo notifications@github.com
wrote:

I'll have to hack my code to use http-client.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADBB1bP0Zg-AKG-_k-eQazfdcmTO9Yrks5ql_xugaJpZM4JxN5E
.

erikd commented

My test was using HTTPS, which means I should switch to http-client-tls. Has that also been fixed?

erikd commented

So http-conduit uses http-client?

Yes

On Fri, Sep 2, 2016, 1:47 PM Erik de Castro Lopo notifications@github.com
wrote:

So http-conduit uses http-client?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADBB0F2-i93HYvxEgXeG7xjiGQ09fasks5ql_6vgaJpZM4JxN5E
.

Yuras commented

@erikd I was able to reproduce the issue in your test case. I got

dummy-https-server: HandshakeFailed (Error_Packet_Parsing "Failed reading: cannot decode alert level\nFrom:\talerts\n\n")

in three runs in a row.

After compiling via http-client with the patch applied, I don't see the error any more (tried 3 times). So I believe the patch fixes the issue.

erikd commented

Ok, I seem to have my test setup with the patched version of http-client.

I don't see the HandshakeFailed messages. That's good. However, I do see this message from the test-warp-wai process going into a failure state where it prints:

runHttpsClient : ConnectionFailure getProtocolByName: does not exist (no such protocol
name: tcp)

until terminated. This is due to file descriptor starvation.This did not happen with connection == 0.2.5.* even after running for over 5 minutes. With connection == 0.2.6.* and the patch, it was happening with about 2 minutes.

That sounds like a separate bug that should get opened against the
connection repo.

On Fri, Sep 2, 2016 at 2:30 PM, Erik de Castro Lopo <
notifications@github.com> wrote:

Ok, I seem to have my test setup with the patched version of http-client.

I don't see the HandshakeFailed messages. That's good. However, I do see
this message from the test-warp-wai process going into a failure state
where it prints:

runHttpsClient : ConnectionFailure getProtocolByName: does not exist (no such protocol name: tcp)

until terminated. This is due to file descriptor starvation.This did not
happen with connection == 0.2.5.* even after running for over 5 minutes.
With connection == 0.2.6.* and the patch, it was happening with about 2
minutes.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADBB7yPotjFP6VND55nzdZUlDQtw3E1ks5qmAjZgaJpZM4JxN5E
.

I had the same experience with Amazonka. Sounds pretty related

Data leakage has been fixed in (in think) tls.