yesodweb/yesod

Yesod is catching Timeout, which breaks the timeout middleware

pbrisbin opened this issue · 10 comments

It seems Yesod is catching the Timeout exception used by the standard timeout function and turning it into a 500 instead of letting to get back to timeout. This means that when we use the Timeout middleware, instead of getting a 503 as desired, we get see InternalError "<<timeout>>" come through as a 500.

Steps

Given the following executable:

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}

module Main where

import Control.Concurrent (threadDelay)
import Data.Maybe (fromMaybe)
import Data.Text (Text)
import Network.HTTP.Types.Status (status200)
import Network.Wai (responseLBS)
import Network.Wai.Handler.Warp as Warp
import Network.Wai.Middleware.Timeout as Middleware
import System.Environment (lookupEnv)
import Text.Read (readMaybe)
import Yesod hiding (respond)

data App = App

mkYesod "App" [parseRoutes|
/ RootR GET
|]

getRootR :: Handler Text
getRootR = do
    liftIO $ threadDelay $ 3 * 1000000
    pure "Hello word\n"

instance Yesod App

simpleApp :: Application
simpleApp _req respond = do
    threadDelay $ 3 * 1000000
    respond $ responseLBS status200 [] "Hello world\n"

main :: IO ()
main = do
    app <- do
        x <- lookupEnv "YESOD"
        if x == Just "1" then toWaiAppPlain App else pure simpleApp

    port <- fromMaybe 3000 . (readMaybe =<<) <$> lookupEnv "PORT"
    Warp.run port $ Middleware.timeout 2 app

Expected behavior

When run using the simpleApp, it does as expected (a 503):

stack exec -- timeout-example
% curl -D - localhost:3000
HTTP/1.1 503 Service Unavailable
Transfer-Encoding: chunked
Date: Mon, 27 Jun 2022 20:35:32 GMT
Server: Warp/3.3.21

Actual behavior

When run using App with yesod, the Timeout turns into a 500:

% YESOD=1 stack exec -- timeout-example
27/Jun/2022:16:40:38 -0400 [Error#yesod-core] <<timeout>> @(yesod-core-1.6.23.1-BngrWz5NDXK4xc2KyG7z7M:Yesod.Core.Class.Yesod src/Yesod/Core/Class/Yesod.hs:691:6)
% curl -D - localhost:3000
HTTP/1.1 500 Internal Server Error
Date: Mon, 27 Jun 2022 20:40:38 GMT
Server: Warp/3.3.21
Content-Length: 157
Content-Type: text/html; charset=utf-8
Set-Cookie: _SESSION=vYgezGHwk6Qauaj0RdmWc3XmIZh0HaWbZ7phPFzTHrdoLpzE/UrxshoB+rp1pgMbzE5PUMWbb65dIfdCY30rYn9QWkh21VocKCGttI0FAgzT1lH9F4Owj2sWZ/PqFfBq0Daylw0SHfGt230IZZ18oT5je1RR1CGjD00WkZlY3KrUivN0y2E=; Path=/; Expires=Mon, 27-Jun-2022 22:40:36 GMT; HttpOnly
Vary: Accept, Accept-Language
X-XSS-Protection: 1; mode=block

<!DOCTYPE html>
<html><head><title>Internal Server Error</title></head><body><h1>Internal Server Error</h1>
<pre>&lt;&lt;timeout&gt;&gt;</pre>
</body></html>%

Versions

The above does not reproduce in lts-19.0, and appears in lts-19.1. From that, I assumed the bug is in yesod-core-1.6.22.0. I eventually found this code, which appears kind of intentional.

Yes this indeed looks intentional, #1753

it's better if yesod handles [async exceptions] so that we get a nice rendered error page.

I think Timeout was an oversight, but I agree with this benefit generally. I'm not sure if we can have it both ways?

I've found another circumstance in which this has broken our error-handling/reporting:

We see a lot of InvalidRequest Warp: Client closed connection prematurely noise in our logs starting with this update. Warp has always considered this a "normal" exception not worth reporting (and I agree),

By Yesod taking over and not letting it get to Warp, the defaultShouldDisplayException behavior (which we leveraged to keep this noise out of our logs) no longer works.

Only thing I can think of is adding a specific list of exceptions inside Yesod that it should not catch and should instead propagate.

Well, I've been thinking about this quite a bit. If it were up to me, the PR should be reverted based on the advice in your blog post on the matter:

[W]ith asynchronous exceptions, you never want to recover ... If you swallow those exceptions ... you break the very nature of the async exception mechanism. [W]ith async exceptions, you are allowed to clean up, but never recover.

I'd considering rescuing to the framework's error page a form of "recovery". And I think violating that rule will continue to run us into edge-case trouble. Timeout is just the canary in the coal mine.

Your original justification for this fix was,

I can't think of a good reason to let async exceptions propagate in this kind of case, since ultimately Warp will catch and (poorly) handle those anyway

I think Timeout is a "good reason" and I've also mentioned that we actually want Warp to handle them because it owns more knowledge about when things can be safely ignored. If you find Warp's handling "poor", you can modify onException. I'd dig into the user-requirement that async exceptions render "my Yesod error page" and see if maybe "a better error page" is acceptable.

Correct me if I'm wrong, but I think the original Issue (#1751) could also be solved in user code through Yesod middleware,

instance Yesod App where
  yesodMiddleware h =
    defaultYesodMiddleware h `catchSyncOrAsync` \ex ->
      -- render your 500 page

We could even supply an out of the box middleware for it, and even include it in defaultYesodMiddleware. That seems like a relatively elegant way to make this an opt-in/out for users.

That's just my 2c though. A safe-list approach would certainly solve the Timeout case, which seems like the most pressing "bug".

I see now the original issue already mentioned it's going against the blog post's advice, and the complexity with re-throwing AllocationLimitExceeded in particular means my middleware suggestion wouldn't work and they'd end up with the workaround they were already doing anyway. So I guess I'm not really raising anything new here, apologies.

I honestly don't feel great about any of this. Pinging @jappeace. @pbrisbin is raising some valid concerns with the previous async exception behavior change. Here's how I'd summarize the situation:

  • Generally, catching an async exception is a bad thing
  • However, in the web server context, that's not true. Since each connection is handled by its own green thread, catching those async exceptions and turning them into valid responses to the client makes all the sense in the world.
  • The issue here is: who should do the catching?
    • @jappeace wants the catching done by Yesod, since it allows for better error output
    • @pbrisbin wants to rely on WAI middlewares and some of Warp's built-in behavior

I can't really say that either is "more correct" than the other. A configuration option to determine which async exceptions (if any) should be caught by Yesod is the best middle ground I can think of.

Thanks for the summary Michael!

in the web server context, that's not true. ... catching those async exceptions and turning them into valid responses to the client makes all the sense in the world.

FWIW, I think we should be careful to distinguish Yesod as the web framework and Warp as the web server. I agree with you that the server bounding the request thread in error handling and ensuring valid client responses makes complete sense. Which is why I consider the framework taking that ability away from the server an over-reach.

That said,

A configuration option to determine which async exceptions (if any) should be caught by Yesod

Would absolutely work for us.

Making it configurable would also work for us.
I think we don't want to catch the connection closed exception (doesn't matter since the connection is closed after all), but we do want to catch the timeout one.

Cool. Would one of you be interested in submitting a PR with this kind of change?

yes, I"ll try doing this wensday