Warp busy-loops on EMFILE
Closed this issue · 8 comments
This is a follow-up to #830. Adding in subscribers from there: @snoyberg @domenkozar @kazu-yamamoto @ProofOfKeags.
The retry logic added in https://github.com/yesodweb/wai/pull/831/files does not make sense to me.
It is present in warp >= warp-3.3.19
.
As @snoyberg wrote in #830 (comment), that code will busy-loop:
acceptNewConnection = do
ex <- try getConnMaker
case ex of
Right x -> return $ Just x
Left e -> do
let getErrno (Errno cInt) = cInt
eConnAborted = getErrno eCONNABORTED
eMfile = getErrno eMFILE
merrno = ioe_errno e
if merrno == Just eConnAborted || merrno == Just eMfile
then acceptNewConnection
The busy-loop is:
acceptNewConnection = do
ex <- try getConnMaker
case ex of
Left e -> do
-- ...
if ... || merrno == Just eMfile
then acceptNewConnection
Here is a minimal example that demonstrates it.
We set the open files
limit to 1024, and then open a couple of files until we have 1024 open files. Then we start Warp.
#!/usr/bin/env stack
-- stack script --compile --resolver lts-20.18 --package warp --package wai --package http-types --package unix --package directory
{-# LANGUAGE OverloadedStrings #-}
module Main where
import Control.Concurrent (threadDelay)
import Control.Exception
import Data.Foldable
import Data.Traversable
import Network.Wai
import Network.HTTP.Types.Status (status200)
import qualified Network.Wai.Handler.Warp as Warp
import System.Directory (listDirectory)
import System.Environment (getArgs)
import System.IO
import System.Posix.Resource
main :: IO ()
main = do
[numFdsToLeaveFreeStr] <- getArgs
let numFdsToLeaveFree :: Int = read numFdsToLeaveFreeStr
-- Set max open files to 1024, as it is the default on many Linux systems.
-- For reproducibility, just in case the current system has a higher limit.
setResourceLimit ResourceOpenFiles (ResourceLimits{ softLimit = ResourceLimit 1024, hardLimit = ResourceLimit 1024 })
numOpenFiles <- length <$> listDirectory "/proc/self/fd"
let numExtraFilesToOpen = 1024 - numOpenFiles - numFdsToLeaveFree
putStrLn $ show numOpenFiles ++ " open files, opening another " ++ show numExtraFilesToOpen
bracket
(for [1..numExtraFilesToOpen] $ \i -> openFile "/proc/self/cmdline" ReadMode) -- open many files, doesn't matter which ones
(\files -> do
putStrLn "Closing files"
traverse_ hClose files
)
$ \files -> do
Warp.run 1234 $ \req respond -> do
putStrLn "got request, sleeping before responding"
threadDelay 1000000
putStrLn "responding"
respond $ responseLBS status200 [] "hi"
(lts-20.18
has warp-3.3.23
so it has the code from #831 in, which was released as part of warp-3.3.19
.)
Observing the run of the above with stack Main.hs 0
, and attaching with sudo strace -fyp "$(pidof Main)"
we can observe the busy-loop:
[pid 1935033] accept4(1023<socket:[232666022]>, 0x42015de340, [128], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EMFILE (Too many open files)
[pid 1935033] accept4(1023<socket:[232666022]>, 0x42015de450, [128], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EMFILE (Too many open files)
[pid 1935033] accept4(1023<socket:[232666022]>, 0x42015de560, [128], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EMFILE (Too many open files)
This is bad:
- The process busy-spins runs at 100% CPU.
- No error is reported, no exception is thrown.
- If you run it from ghci with
:main
, and Ctrl+C there, nothing happens; another Ctrl+C will bring you back to the ghci prompt, but theaccept4()
busy loop will go on in the background.
Further, the issue also happens when there are still FDs free when warps starts.
Run stack Main.hs 1
instead of stack Main.hs 0
to leave one FD free. Now warp starts without error.
But when you run curl 'http://localhost:1234'
, during the time the request handler runs (1 second as per my threadDelay
) warp busy-loops around EMFILE
.
This means the issue can also occur any time during production usage: As soon as any request touches the open files limit, busy looping starts, and clients like curl
hang until warp gets a free FD.
Further bad is:
- The busy-looping will run without any checks for timeout (e.g.
setTimeout
). This means that in this state, slowloris protection somewhat defunct, and acurl
may hang for much longer than the value you give tosetTimeout
.
But you cannotclose()
a socket that you haven'taccept()
ed; anything that sits in thelisten(..., int backlog)
queue to my knowledge cannot be ejected, so we probably cannot do anything against that. (One could setbacklog = 1
to reject further connections but from a quick search it isn't guaranteed that the OS accepts that hint, and some places document that setting a small backlog can result in less web serving performance -- probably in the case that the userspace program cannotaccept()
fast enough.)
curl
hanging is problematic when building reliable systems. It is better when web servers can tell that they are too busy e.g. by returning HTTP 503 Service Unavailable
, which is a way to indicate that a server is overloaded. Many web servers have a setting for the maximum amount of connections they will accept, independent (but necessarily lower than) the max number of open files. For example nginx
has worker_connections
, so that it can return 503 Service Unavailable
when those are reached, before it gets into the problem of hitting EMFILE
. Ideally warp should have a setting for this, too, and count how many request handlers are currently active. That setting could be defaulted to be reasonably less than getResourceLimit ResourceOpenFiles
. Of course it is not perfect, since warp
-the-library may run alongside other Haskell threads that open files, or the request handler might open more files, so hitting a hang in curl
due to EMFILE
cannot be entirely avoided (unless one patches the entire system to count every FD opened, to add this number to the threshold).
But such setting could still be very effective in practice: If the rlimit
is 10k open files, Warp could set this maxConnections
setting to e.g. 5k, so then there's still a lot of headroom to hitting a hang. This would adopt nginx's philosophy.
My recommendations
- Warp should not busy loop.
- For the case where warp cannot even run the initial
accept()
, there should be some configurableretryAcceptDelay
betweenaccept()
retries. - For the case where warp did accept some connection, it should wait until either
retryAcceptDelay
passed, or another warp request handler has finished (which will free up an FD).
- For the case where warp cannot even run the initial
- Warp should add a
maxConnections
setting; exceeding it should return503 Service Unavailable
.- This will prevent hangs in practice.
The behaviour of warp < 3.3.19
as as follows (try e.g. --resolver lts-20.18
to get 3.3.18
):
- You run
stack Main.hs 1
andtime curl 'http://localhost:1234'
. - Warp
accept()
s the request, and forks off the request handler thread (will take 1 second in my code above). - Warp immediatly calls
accept()
again to wait for the next client. This fails withEMFILE
. Warp immediately printsNetwork.Socket.accept: resource exhausted (Too many open files)
, but does not terminate: - Warp waits until all previous accept handlers are over. It does no longer accept new requests; they hang on the client side.
- Then Warp exits.
This was bad because:
- Any long-running request handler (e.g. a slow, streaming up/download will keep Warp alive, but in a state where new clients like
curl
hang permanently. - These clients continue to hang even if meanwhile lots of FDs got freed up.
From this perspective, the change from #830 made sense:
It allows Warp to accept new clients, resolving their hangs as soon as FDs become free.
So I'm in general in favour of that change from #830, just that the implementation isn't great (busy-looping without sleep).
- Warp waits until all previous accept handlers are over.
This is made worse by the fact that Warp does not detect client disconnects (#196).
It means that the EMFILE
issue can be triggered very easily in realistic setups.
For example, consider the common scenario that you have Warp+Postgres web app. For uptime checking, you use e.g. AWS Route53 health checks, Consul health checks, or some uptime checking service such as Pingdom.
These HTTP checks usually have some HTTP timeout set, e.g. 5 seconds.
Let's say you implement a route /db-readable
in Warp and put your health check against that.
Assume that when the DB goes down, the Postgres connection from the Warp request handler blocks.
Then the health check will query /db-writable
, and give up and disconnect after 5 seconds.
But Warp does not notice the disconnect. It leaks the FD, forever (until the default TCP timeout of 3600 seconds).
(The slowloris timeout does not help with this, as the client has already fully sent the request, and it's now on the server to produce a response.)
If your health checks come in once per second, you'll exceed 1024 leaked FDs in less than half an hour.
Other web servers do not have this problem:
I show how nc
and nginx
detect disconnects here: #196 (comment)
@nh2 Thank you for analyzing.
Could you send a PR according to your recommendations?
@nh2 gentle ping
@kazu-yamamoto Unfortunately I haven't had the time yet to look into the implementation. I'll post here in case I start it!
I'm still getting this on warp-3.3.31 and it's pretty deveastating to see the server busy loop and take down production.
@kazu-yamamoto I had time to look now again.
I think that simply reverting that commit, as done in PR #933, will not improve the situation because it will just re-introduce the issues mentioned in #928 (comment) (which describes the issues without that commit).
I suspect we really have to do the approach from "My recommendations" in #928 (comment)