Warp busy-loops on EMFILE

Question

Warp busy-loops on EMFILE

Closed this issue a year ago · 8 comments

This is a follow-up to #830. Adding in subscribers from there: @snoyberg @domenkozar @kazu-yamamoto @ProofOfKeags.

The retry logic added in https://github.com/yesodweb/wai/pull/831/files does not make sense to me.

It is present in warp >= warp-3.3.19.

As @snoyberg wrote in #830 (comment), that code will busy-loop:

    acceptNewConnection = do
        ex <- try getConnMaker
        case ex of
            Right x -> return $ Just x
            Left e -> do
                let getErrno (Errno cInt) = cInt
                    eConnAborted = getErrno eCONNABORTED
                    eMfile = getErrno eMFILE
                    merrno = ioe_errno e
                if merrno == Just eConnAborted || merrno == Just eMfile
                    then acceptNewConnection

The busy-loop is:

    acceptNewConnection = do
        ex <- try getConnMaker
        case ex of
            Left e -> do
                -- ...
                if ... || merrno == Just eMfile
                    then acceptNewConnection

Here is a minimal example that demonstrates it.

We set the open files limit to 1024, and then open a couple of files until we have 1024 open files. Then we start Warp.

#!/usr/bin/env stack
-- stack script --compile --resolver lts-20.18 --package warp --package wai --package http-types  --package unix --package directory

{-# LANGUAGE OverloadedStrings #-}

module Main where

import           Control.Concurrent (threadDelay)
import           Control.Exception
import           Data.Foldable
import           Data.Traversable
import           Network.Wai
import           Network.HTTP.Types.Status (status200)
import qualified Network.Wai.Handler.Warp as Warp
import           System.Directory (listDirectory)
import           System.Environment (getArgs)
import           System.IO
import           System.Posix.Resource

main :: IO ()
main = do
  [numFdsToLeaveFreeStr] <- getArgs
  let numFdsToLeaveFree :: Int = read numFdsToLeaveFreeStr

  -- Set max open files to 1024, as it is the default on many Linux systems.
  -- For reproducibility, just in case the current system has a higher limit.
  setResourceLimit ResourceOpenFiles (ResourceLimits{ softLimit = ResourceLimit 1024, hardLimit = ResourceLimit 1024 })

  numOpenFiles <- length <$> listDirectory "/proc/self/fd"
  let numExtraFilesToOpen = 1024 - numOpenFiles - numFdsToLeaveFree

  putStrLn $ show numOpenFiles ++ " open files, opening another " ++ show numExtraFilesToOpen

  bracket
    (for [1..numExtraFilesToOpen] $ \i -> openFile "/proc/self/cmdline" ReadMode) -- open many files, doesn't matter which ones
    (\files -> do
      putStrLn "Closing files"
      traverse_ hClose files
    )
    $ \files -> do

      Warp.run 1234 $ \req respond -> do
        putStrLn "got request, sleeping before responding"
        threadDelay 1000000
        putStrLn "responding"
        respond $ responseLBS status200 [] "hi"

(lts-20.18 has warp-3.3.23 so it has the code from #831 in, which was released as part of warp-3.3.19.)

Observing the run of the above with stack Main.hs 0, and attaching with sudo strace -fyp "$(pidof Main)" we can observe the busy-loop:

[pid 1935033] accept4(1023<socket:[232666022]>, 0x42015de340, [128], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EMFILE (Too many open files)
[pid 1935033] accept4(1023<socket:[232666022]>, 0x42015de450, [128], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EMFILE (Too many open files)
[pid 1935033] accept4(1023<socket:[232666022]>, 0x42015de560, [128], SOCK_CLOEXEC|SOCK_NONBLOCK) = -1 EMFILE (Too many open files)

This is bad:

The process busy-spins runs at 100% CPU.
No error is reported, no exception is thrown.
If you run it from ghci with:main, and Ctrl+C there, nothing happens; another Ctrl+C will bring you back to the ghci prompt, but the accept4() busy loop will go on in the background.

Further, the issue also happens when there are still FDs free when warps starts.

Run stack Main.hs 1 instead of stack Main.hs 0 to leave one FD free. Now warp starts without error.
But when you run curl 'http://localhost:1234', during the time the request handler runs (1 second as per my threadDelay) warp busy-loops around EMFILE.
This means the issue can also occur any time during production usage: As soon as any request touches the open files limit, busy looping starts, and clients like curl hang until warp gets a free FD.

Further bad is:

The busy-looping will run without any checks for timeout (e.g. setTimeout). This means that in this state, slowloris protection somewhat defunct, and a curl may hang for much longer than the value you give to setTimeout.
But you cannot close() a socket that you haven't accept()ed; anything that sits in the listen(..., int backlog) queue to my knowledge cannot be ejected, so we probably cannot do anything against that. (One could set backlog = 1 to reject further connections but from a quick search it isn't guaranteed that the OS accepts that hint, and some places document that setting a small backlog can result in less web serving performance -- probably in the case that the userspace program cannot accept() fast enough.)

curl hanging is problematic when building reliable systems. It is better when web servers can tell that they are too busy e.g. by returning HTTP 503 Service Unavailable, which is a way to indicate that a server is overloaded. Many web servers have a setting for the maximum amount of connections they will accept, independent (but necessarily lower than) the max number of open files. For example nginx has worker_connections, so that it can return 503 Service Unavailable when those are reached, before it gets into the problem of hitting EMFILE. Ideally warp should have a setting for this, too, and count how many request handlers are currently active. That setting could be defaulted to be reasonably less than getResourceLimit ResourceOpenFiles. Of course it is not perfect, since warp-the-library may run alongside other Haskell threads that open files, or the request handler might open more files, so hitting a hang in curl due to EMFILE cannot be entirely avoided (unless one patches the entire system to count every FD opened, to add this number to the threshold).
But such setting could still be very effective in practice: If the rlimit is 10k open files, Warp could set this maxConnections setting to e.g. 5k, so then there's still a lot of headroom to hitting a hang. This would adopt nginx's philosophy.

My recommendations

Warp should not busy loop.
- For the case where warp cannot even run the initial accept(), there should be some configurable retryAcceptDelay between accept() retries.
- For the case where warp did accept some connection, it should wait until either retryAcceptDelay passed, or another warp request handler has finished (which will free up an FD).
Warp should add a maxConnections setting; exceeding it should return 503 Service Unavailable.
- This will prevent hangs in practice.

domenkozar commented a month ago

#1003

Answer 1 · 2023-04-22T21:08:07.000Z

The behaviour of warp < 3.3.19 as as follows (try e.g. --resolver lts-20.18 to get 3.3.18):

You run stack Main.hs 1 and time curl 'http://localhost:1234'.
Warp accept()s the request, and forks off the request handler thread (will take 1 second in my code above).
Warp immediatly calls accept() again to wait for the next client. This fails with EMFILE. Warp immediately prints Network.Socket.accept: resource exhausted (Too many open files), but does not terminate:
Warp waits until all previous accept handlers are over. It does no longer accept new requests; they hang on the client side.
Then Warp exits.

This was bad because:

Any long-running request handler (e.g. a slow, streaming up/download will keep Warp alive, but in a state where new clients like curl hang permanently.
These clients continue to hang even if meanwhile lots of FDs got freed up.

From this perspective, the change from #830 made sense:

It allows Warp to accept new clients, resolving their hangs as soon as FDs become free.

So I'm in general in favour of that change from #830, just that the implementation isn't great (busy-looping without sleep).

Answer 2 · 2023-04-23T02:34:34.000Z

Warp waits until all previous accept handlers are over.

This is made worse by the fact that Warp does not detect client disconnects (#196).

It means that the EMFILE issue can be triggered very easily in realistic setups.

For example, consider the common scenario that you have Warp+Postgres web app. For uptime checking, you use e.g. AWS Route53 health checks, Consul health checks, or some uptime checking service such as Pingdom.

These HTTP checks usually have some HTTP timeout set, e.g. 5 seconds.

Let's say you implement a route /db-readable in Warp and put your health check against that.

Assume that when the DB goes down, the Postgres connection from the Warp request handler blocks.

Then the health check will query /db-writable, and give up and disconnect after 5 seconds.

But Warp does not notice the disconnect. It leaks the FD, forever (until the default TCP timeout of 3600 seconds).

(The slowloris timeout does not help with this, as the client has already fully sent the request, and it's now on the server to produce a response.)

If your health checks come in once per second, you'll exceed 1024 leaked FDs in less than half an hour.

Other web servers do not have this problem:

I show how nc and nginx detect disconnects here: #196 (comment)

Answer 3 · 2023-04-26T07:24:58.000Z

@nh2 Thank you for analyzing.
Could you send a PR according to your recommendations?

Answer 4 · 2023-05-17T00:33:23.000Z

@nh2 gentle ping

Answer 5 · 2023-05-17T07:56:02.000Z

@kazu-yamamoto Unfortunately I haven't had the time yet to look into the implementation. I'll post here in case I start it!

Answer 6 · 2024-09-13T23:44:40.000Z

I'm still getting this on warp-3.3.31 and it's pretty deveastating to see the server busy loop and take down production.

Answer 7 · 2024-09-14T11:50:30.000Z

@kazu-yamamoto I had time to look now again.

I think that simply reverting that commit, as done in PR #933, will not improve the situation because it will just re-introduce the issues mentioned in #928 (comment) (which describes the issues without that commit).

I suspect we really have to do the approach from "My recommendations" in #928 (comment)