coro-http.request hangs on closed connection
SinisterRectus opened this issue · 14 comments
On rare occasions, requests infinitely hang. It's somewhat difficult to debug the coro libs because a lot of functionality is hidden in closures, but I think I've narrowed down the issue. I'm documenting what I have here until I or someone else can discover more about this.
- Receive
connection: close
instead ofconnection: keep-alive
in an HTTP response.
In http-codec
:
keepAlive
becomesfalse
elseif lowerKey == "connection" then
head.keepAlive = lower(value) == "keep-alive"
mode
becomesdecodeRaw
if head.keepAlive and (not (chunkedEncoding or (contentLength and contentLength > 0)))
or (head.method == "GET" or head.method == "HEAD") then
mode = decodeEmpty
elseif chunkedEncoding then
mode = decodeChunked
elseif contentLength then
bytesLeft = contentLength
mode = decodeCounted
elseif not head.keepAlive then
mode = decodeRaw
end
function decodeRaw(chunk, index)
if #chunk < index then return end
return sub(chunk, index)
end
In coro-http
:
read
returnsnil
foritem
body = {}
if req.method == "HEAD" then
connection.reset()
else
while true do
local item = read()
if not item then
res.keepAlive = false
break
end
if #item == 0 then
break
end
body[#body + 1] = item
end
end
keepAlive
isfalse
, so the connection is closed by callingwrite
with no arguments, but it never returns.
if res.keepAlive then
saveConnection(connection)
else
write()
end
The following reproduces the issue:
local weblit = require('weblit')
weblit.app.bind {host = '127.0.0.1', port = 1337}
.use(require('weblit-auto-headers'))
.route({
method = "DELETE",
path = "/foo",
}, function (req, res, go)
if req.query and req.query.q == '1' then
res.headers['Connection'] = 'Close'
end
res.code = 204
res.body = nil
end)
.start()
local http = require('coro-http')
coroutine.wrap(function()
p('req1', http.request("DELETE", "http://127.0.0.1:1337/foo"))
p('req2', http.request("DELETE", "http://127.0.0.1:1337/foo?q=1"))
end)()
The same issue occurs; write
never returns here:
if res.keepAlive then
saveConnection(connection)
else
write()
end
The trace after calling write
is:
coro-wrapper.lua:137
coro-wrapper.lua:138
coro-channel.lua:124
coro-channel.lua:130
coro-channel.lua:131
coro-channel.lua:132
coro-channel.lua:39
coro-channel.lua:40
coro-channel.lua:23
coro-channel.lua:24
coro-channel.lua:25
coro-channel.lua:26
coro-channel.lua:27
coro-channel.lua:74
coro-channel.lua:78
coro-channel.lua:27
coro-channel.lua:31
coro-channel.lua:32
coro-channel.lua:34
coro-channel.lua:133
coro-channel.lua:117
coro-channel.lua:120
coro-channel.lua:133
coro-channel.lua:134
coro-channel.lua:71
coro-channel.lua:89
coro-channel.lua:39
coro-channel.lua:40
coro-channel.lua:23
coro-channel.lua:81
coro-channel.lua:85
coro-channel.lua:86
coro-channel.lua:87
coro-channel.lua:88
coro-channel.lua:58
coro-channel.lua:64
coro-channel.lua:65
coro-channel.lua:66
coro-channel.lua:67
coro-channel.lua:68
coro-channel.lua:71
coro-channel.lua:89
coro-channel.lua:39
coro-channel.lua:40
coro-channel.lua:23
coro-channel.lua:24
coro-channel.lua:25
coro-channel.lua:26
coro-channel.lua:27
coro-channel.lua:74
coro-channel.lua:78
coro-channel.lua:27
coro-channel.lua:31
coro-channel.lua:32
coro-channel.lua:34
Looks like it craps out after trying to close the socket.
A quick fix for this is to change
function decodeRaw(chunk, index)
if #chunk < index then return end
return sub(chunk, index)
end
to
function decodeRaw(chunk, index)
if #chunk < index then return "", index end
return sub(chunk, index)
end
I don't know if that's the correct fix, though.
Here's the full strack trace for the request.
Additionally, I'm not sure why GET is included in (head.method == "GET" or head.method == "HEAD")
. Only HEAD is guaranteed to not contain a body.
Any idea what situation causes the chunk to be smaller than the index? I'm sure I put that check in there because I ran into it somewhere and it was breaking stuff.
Returning an empty string signals the end of the body. We need to make sure we don't do this if there is more body coming. That's why I'd like to understand what causes this situation in the first place.
Good question. The index is the chunk length + 1. The coro-channel dispatcher doesn't show anything unusual.
'<-' 'HTTP/1.1 204 No Content\r\nServer: false\r\nDate: Thu, 14 Sep 2017 14:19:05 GMT\r\nConnection: Keep-Alive\r\n\r\n'
'<-' nil
'<-' 'HTTP/1.1 204 No Content\r\nConnection: Close\r\nServer: false\r\nDate: Thu, 14 Sep 2017 14:19:05 GMT\r\n\r\n'
'<-' nil
The coro-wrapper goes into the decoder with
'HTTP/1.1 204 No Content\r\nServer: false\r\nDate: Thu, 14 Sep 2017 14:26:04 GMT\r\nConnection: Keep-Alive\r\n\r\n' 1
'HTTP/1.1 204 No Content\r\nServer: false\r\nDate: Thu, 14 Sep 2017 14:26:04 GMT\r\nConnection: Keep-Alive\r\n\r\n' 104
'HTTP/1.1 204 No Content\r\nServer: false\r\nDate: Thu, 14 Sep 2017 14:26:04 GMT\r\nConnection: Keep-Alive\r\n\r\n' 104
'HTTP/1.1 204 No Content\r\nConnection: Close\r\nServer: false\r\nDate: Thu, 14 Sep 2017 14:26:04 GMT\r\n\r\n' 1
'HTTP/1.1 204 No Content\r\nConnection: Close\r\nServer: false\r\nDate: Thu, 14 Sep 2017 14:26:04 GMT\r\n\r\n' 99
and comes out with
{ { 'Server', 'false' }, { 'Date', 'Thu, 14 Sep 2017 14:27:10 GMT' }, { 'Connection', 'Keep-Alive' }, code = 204, reason = 'No Content', version = 1.1, keepAlive = true } 104
'' 104
nil nil
{ { 'Connection', 'Close' }, { 'Server', 'false' }, { 'Date', 'Thu, 14 Sep 2017 14:27:10 GMT' }, code = 204, reason = 'No Content', version = 1.1, keepAlive = false } 99
nil nil
I'm not sure why there are 3 reads in the first request and 2 in the second.
It looks like the buffer is small enough to be decoded in one pass. What happens comes down to
if head.keepAlive and (not (chunkedEncoding or (contentLength and contentLength > 0)))
or (head.method == "GET" or head.method == "HEAD") then
mode = decodeEmpty
elseif chunkedEncoding then
mode = decodeChunked
elseif contentLength then
bytesLeft = contentLength
mode = decodeCounted
elseif not head.keepAlive then
mode = decodeRaw
end
When keepAlive
is true (and the other conditions are met), an empty string is returned by decodeEmpty
. Otherwise, in this case, nil is returned by decodeRaw
since the end of buffer is reached; there is no body after the headers on the DELETE request.
Hmm, I guess it's length + 1 because the end index is inclusive and equals length in lua's 1-based indexes.
It seems to me that that the client is trying to read a body that does not exist. Interestingly, if I change res.body = nil
to res.body = ''
in the weblit server example, the client request is successful. I suppose the empty string is made meaningful by the extra \r\n
.
Should a special case be added so that decodeEmpty
is called for a 204 response?
Should http-codec be handling contentLength == 0
?
So the idea is that the decoder will always emit a table for the head, followed by zero or more non-empty body chunks, followed by exactly one empty string to signify the end of body. After that the cycle repeats so that a single TCP connection can be reused for multiple http requests.
Of course, if the connection gets upgraded (via websockets), then the http-codec gets disconnected entirely and it's internal state GCed.
Right... but that doesn't happen. Again, coro-http.request
never returns.
Any thoughts on this? I cannot figure out how this is breaking. Debugging shows that the socket closes, but instead of bubbling back up to coro-http, it just jumps right to the Luvit exit sequence.
...
coro-http.lua,158
coro-http.lua,159
coro-http.lua,160
coro-http.lua,169
coro-http.lua,172 -- last call in coro-http is `write()`
coro-wrapper.lua,137
coro-wrapper.lua,138
coro-channel.lua,124
coro-channel.lua,130
coro-channel.lua,131
coro-channel.lua,132
coro-channel.lua,39
coro-channel.lua,40
coro-channel.lua,23
coro-channel.lua,24
coro-channel.lua,25
coro-channel.lua,26
coro-channel.lua,27
coro-channel.lua,74
coro-channel.lua,78
coro-channel.lua,27
coro-channel.lua,31
coro-channel.lua,32
coro-channel.lua,34
coro-channel.lua,133
coro-channel.lua,117
coro-channel.lua,120
coro-channel.lua,133
coro-channel.lua,134
coro-channel.lua,71
coro-channel.lua,89
coro-channel.lua,39
coro-channel.lua,40
coro-channel.lua,23
bundle:init.lua,53
bundle:init.lua,55
bundle:init.lua,57
...
I suspect some coroutine issue, but I do not know enough about the expected behavior of the libraries to debug this.
Again, here is an example that reproduces the issue:
local weblit = require('weblit')
weblit.app.bind {host = '127.0.0.1', port = 1337}
.use(require('weblit-auto-headers'))
.route({
method = "DELETE",
path = "/foo",
}, function (req, res)
if req.query and req.query.q == '1' then
res.headers['Connection'] = 'Close'
end
res.code = 204
res.body = nil
end)
.start()
local http = require('coro-http')
coroutine.wrap(function()
debug.sethook(function(_, line)
local source = debug.getinfo(2, 'S').source:match('([^/\\]+)$')
io.write(source, ',', line, '\n') -- luvit print fails after a certain point
end, 'l')
-- p(http.request("DELETE", "http://127.0.0.1:1337/foo")) -- good request
p(http.request("DELETE", "http://127.0.0.1:1337/foo?q=1")) -- bad request
end)()
Also interesting: If you try to print debug info with print
, it fails after a certain point. io.write
continues to work.
Okay, I've just managed to isolate the issue to an extent.
In coro-channel:
local function wait()
local thread = coroutine.running()
return function (err)
assert(coroutine.resume(thread, err))
end
end
if chunk == nil then
closer.written = true
closer.check()
socket:shutdown(wait())
return coroutine.yield()
end
The coroutine seems to never be resumed on socket shutdown; the wait closure is never called.
I made a PR to address the handling of the shutdown. In this case, the specific error is EPIPE: broken pipe
.