ssl issue
nightshade427 opened this issue · 25 comments
When I do
(as:with-event-loop (:catch-app-errors t)
(asf:alet ((mine (drakma-async:http-request "https://23.92.28.184" :preserve-uri t)))
(format t "~%data: ~a" mine)))
I get
SSL connection error: 336031996: error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol `
But if I do normal drakma it works
(as:with-event-loop (:catch-app-errors t)
(asf:alet ((mine (drakma:http-request "https://23.92.28.184" :preserve-uri t)))
(format t "~%data: ~a" mine)))
Any ideas?
I think I found it. There were a few symbols not being imported since I upgraded the backend drakma. ALSO once that was fixed, it seemed that re-using an SSL stream after a redirect was causing a segfault. I don't know why, I didnt do much digging. I just disabled stream re-use across the board and it seems to work a lot better now. This isn't ideal, but I'd rather get things working and optimize later.
Please give it a shot when you get a chance!
Still seems to be happening with latest drakma-async master :(
If you try the above async one a few times in a row you should get the error above.
Thanks for looking into this, I really appreciate it ;)
Any ideas? The original drakma doesnt seem to give this error. It only happens in two of our data centers as well (linode) where I dont think do a good job with ssl. But, drakma, curl, and others handle it fine. But, for some reason drakma-async bombs with
SSL connection error: 336031996: error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol
I also opened an issue with linode so they can fix root issue of ssl issues. But it seems curl and original drakam work around the issue and work fine. Any idea why drakma-async wouldnt?
Let me know if I can be of any help figuring this out.
Sorry Nick, was in Peru for a few weeks (without any computer) and just got back. I can take a look at this toinght.
Awesome, thanks!!
Hope you had a great trip ;)
It almost seems as though after multiple calls it somehow starts using http instead of https?
Can you verify the SSL versions being used on your servers? I wonder if some of them have a version mismatch. Either that, or an HTTPS client (drakma-async) is talking to an HTTP server. I'm unable to verify what you're seeing because I can't connect to 23.92.28.184
and HTTPS seems to be working with a lot of other sites I'm testing (doing what you suggested, trying each URL multiple times).
Yeah, it only seems to happen on linode cloud system within the same datacenter for some reason (cross data center is fine). But, drakma handles it fine (as shown above). So whatever linode is doing strange/wrong (seems like sometimes they redirect ssl port to http port or something strange, and it doesnt do it every time, but if I do a batch of 100 requests at least 1 or more will die with the above error) drakma and curl work around and work fine, but drakma-async doesnt. Its the strangest thing.
Well, drakma-async uses a hybrid of cl+ssl (for context creation) and libevent2-ssl (actually wraps the SSL protocol in async). So it's possible cl+ssl has some logic in it the others are using that libevent is just not doing correctly. One more reason to get off libevent, especially since the main reason to stay on it (other than avoiding a massive rewrite) is that it works with SSL out of the box. Great.
Alternatively, I could just be doing something stupid, but I'd be surprised if it worked some places and not others. It would really help me out if you could find an instance of this happening out in the wild on a public IP I can test against. All the SSL servers I hit work fine.
Here is more data that may help:
21:46:15.977275 IP li662-78.members.linode.com.55657 > li685-204.members.linode.com.htt
ps: Flags [S], seq 79222470, win 29200, options [mss 1460,sackOK,TS val 2896837 ecr 0,n
op,wscale 7], length 0
E..<..@.@..S.\.N.....i............r.]..........
.,3.........
21:46:15.977743 IP li685-204.members.linode.com.https > li662-78.members.linode.com.556
57: Flags [S.], seq 409137932, ack 79222471, win 28960, options [mss 1460,sackOK,TS val
307954466 ecr 2896837,nop,wscale 7], length 0
E..<..@.?..W.....\.N...i.b........q m..........
.[.".,3.....
21:46:15.977774 IP li662-78.members.linode.com.55657 > li685-204.members.linode.com.htt
ps: Flags [.], ack 1, win 229, options [nop,nop,TS val 2896838 ecr 307954466], length 0
....]......Z.\.N.....i.......b.
.,3..[."
21:46:15.977837 IP li662-78.members.linode.com.55657 > li685-204.members.linode.com.htt
ps: Flags [P.], seq 1:325, ack 1, win 229, options [nop,nop,TS val 2896838 ecr 30795446
6], length 324
....^........\.N.....i.......b.
.,3..[."GET /api/segmentation.json?from=en&token=XXXXX
ad&customer_id=YYY&texts=%5B%22This+is+a+test.%22,+%22
Yes+it+is.%22%5D HTTP/1.1
Host: 23.239.16.204
User-Agent: Drakma/1.3.9 (SBCL 1.2.1; Linux; 3.15.4-x86_64-linode45; http://weitz.de/dr
akma/)
Accept: */*
21:46:15.978058 IP li685-204.members.linode.com.https > li662-78.members.linode.com.556
57: Flags [.], ack 325, win 235, options [nop,nop,TS val 307954467 ecr 2896838], length
0
.................\.N...i.b.
.[.#.,3.
21:46:15.978218 IP li685-204.members.linode.com.https > li662-78.members.linode.com.556
57: Flags [P.], seq 1:422, ack 325, win 235, options [nop,nop,TS val 307954467 ecr 2896
838], length 421
.................\.N...i.b.
.[.#.,3.HTTP/1.1 400 Bad Request
Server: nginx/1.6.2
Date: Mon, 29 Sep 2014 21:46:15 GMT
Content-Type: text/html
Content-Length: 270
Connection: close
<html>
<head><title>400 The plain HTTP request was sent to HTTPS port</title></head>
<body bgcolor="white">
<center><h1>400 Bad Request</h1></center>
<center>The plain HTTP request was sent to HTTPS port</center>
<hr><center>nginx/1.6.2</center>
</body>
</html>
21:46:15.978231 IP li662-78.members.linode.com.55657 > li685-204.members.linode.com.htt
ps: Flags [.], ack 422, win 237, options [nop,nop,TS val 2896838 ecr 307954467], length
0
E..4..@.@..X.\.N.....i.......b......]......
.,3..[.#
21:46:15.978237 IP li685-204.members.linode.com.https > li662-78.members.linode.com.556
57: Flags [F.], seq 422, ack 325, win 235, options [nop,nop,TS val 307954467 ecr 289683
8], length 0
E..4.0@.?........\.N...i.b.......... ......
.[.#.,3.
21:46:16.017299 IP li662-78.members.linode.com.55657 > li685-204.members.linode.com.htt
ps: Flags [.], ack 423, win 237, options [nop,nop,TS val 2896850 ecr 307954467], length
0
E..4..@.@..W.\.N.....i.......b......]......
.,3..[.#
Its like it sends regular http to the https port, so it freaks out and blows up.
It doesnt do it every time though. Its like the faster the requests the more likely it is to happen. I tried giving each request its own ssl-ctx just to make sure that wasnt issue and was still having the same problem.
CL-USER> (as:with-event-loop ()
(asf:alet ((mine (smart-engine::segment-requests '("This is a test." "Yes it is.") "7577644b-81a1-47c4-8c55-9dc33dc2e621" "en")))
(format t "~%mine: ~s" mine)))
GET /api/segmentation.json?from=en&token=XXX&customer_id=YYY&texts=%5B%22This+is+a+test.%22,+%22Yes+it+is.%\
22%5D HTTP/1.1
Host: 23.239.16.204
User-Agent: Drakma/1.3.9 (SBCL 1.2.1; Linux; 3.15.4-x86_64-linode45; http://weitz.de/drakma/)
Accept: */*
error: SSL connection error: 336031996: error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol
1
CL-USER> (as:with-event-loop ()
(asf:alet ((mine (smart-engine::segment-requests '("This is a test." "Yes it is.") "7577644b-81a1-47c4-8c55-9dc33dc2e621" "en")))
(format t "~%mine: ~s" mine)))
GET /api/segmentation.json?from=en&token=XXX&customer_id=YYY&texts=%5B%22This+is+a+test.%22,+%22Yes+it+is.%\
22%5D HTTP/1.1
Host: 23.239.16.204
User-Agent: Drakma/1.3.9 (SBCL 1.2.1; Linux; 3.15.4-x86_64-linode45; http://weitz.de/drakma/)
Accept: */*
HTTP/1.1 200 OK
Server: nginx/1.6.2
Date: Tue, 30 Sep 2014 01:29:14 GMT
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Status: 200 OK
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
X-Content-Type-Options: nosniff
X-UA-Compatible: chrome=1
ETag: "aa00af1afb47c315261859de231c0142"
Cache-Control: max-age=0, private, must-revalidate
Set-Cookie: request_method=GET; path=/
X-Request-Id: a2a4806b-30b9-4019-bc81-65e6852312dd
X-Runtime: 0.243120
X-Powered-By: Phusion Passenger 4.0.37
mine: ("[\"This is a test.\"]" "[\"Yes it is.\"]")
Okay this does indeed seem to happen when the request connects too fast (guessing). Thats why the internal datacenter one triggers the issue, it connects blazing fast, while other sites do not. I replicated with a local nginx ssl server, 100% of the time. So you should be able to replicate easily now? Just setup a quick local nginx that uses ssl and try and grab something from it. I get the following error every time.
This has the ssl error: SSL connection error: 336031996: error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol
(as:with-event-loop ()
(asf:alet ((mine (drakma-async:http-request "https://127.0.0.1/index.html")))
(format t "~%data: ~s~&" mine)))
This does not:
(as:with-event-loop ()
(asf:alet ((mine (drakma:http-request "https://127.0.0.1/index.html")))
(format t "~%data: ~s~&" mine)))
Now that I can replicate easily, I will try and dig some more and update here if I fond anything.
Let me know if you find out anything ;)
I made requests to ssl sites that both work and dont work. In both cases the stream is wrapped correctly http-steam => stream => flexi-io-stream => chunka:chunked-io-stream => cl-async:async-io-stream => cl-async-ssl:ssl-socket
. So maybe it just not initializing the wrapper stream correctly? Because, even though its wrapped as above its sending the headers from drakma in plain text according to tcpdump.
This is super helpful, thanks for putting this work in. I'll try to get a local test setup via nginx and let you know what I find. You're right that it might be some timing issue where the data is sent before the SSL stream initializes (that'd be my guess too).
Thanks man, appreciate it. Let me know what you find. ;)
Just checking in to see if you had a chance to take a look yet? Anything else I can do to help?
Ok, reproduced this as well, and it appeared to be a timing issue. Why it's a timing issue is beyond my expertise. There's something happening in libevent SSL wrapper that doesn't immediately wrap the socket in SSL before it starts writing (that's my guess).
The good news is adding an as:delay
around the connect-tcp-socket
call in tcp-ssl-connect
fixed it (at least in my local test case). I'm guessing defering the connection to the next loop gives SSL time to do what it needs to. Please try it again when you get a chance.
i dont see any commits for that include the delay (so cant use master)? What should I test?
Let me know ;)
Let me know, I'll test it and report back ;)
Ohh its in cl-async ;)
WORKED!!
Thanks so much ;)
Oh DUH I forgot to mention that it's in cl-async. Sorry!!
I'm glad it's working now!