Nike-Inc/hal

Logging from exception handler

JackKelly-Bellroy opened this issue · 3 comments

CC @Unisay and @lrworth, who work with me on Lambda stuff.

We've had a couple of situations where lambdas died due to unhandled exceptions thrown by handler code, leaving no apparent trace in CloudWatch logs. We could install a catch-all exception handler in our internal convenience libraries, but since hal already has a handler of last resort, we could use that to emit unhandled exceptions to stderr:

Left (Left msg) ->
-- If an exception occurs here, we want that to propogate
sendEventError rcc reqId msg

The only problem I see with this is the risk of leaking information into logs - we should discuss this.

Related: bellroy/wai-handler-hal#4 . I could provide one in wai-handler-hal, but I think that it's better fixed in hal itself.

I would agree that hal is the better place for it. This has caught my attention before and I've even on occasion had to add it myself for debugging.

The only reason I hadn't done this prior is because I was very surprised that AWS Lambda doesn't do this already--the assumption of which is why it was omitted here. I'm not sure why their runtime accepts a message along with a reported error, considering that that message then seems to disappear. I would like to understand this a bit better before completely committing to logging it (but also it's needed and been delayed long enough that I wouldn't want to hold it up too long if a suitable answer never manifests).

As for leaking information, it's certainly a risk, but I think this is something that is reasonably left to the application developer. We can call this out in the README, at least. If errors need to be sanitized, they'll need to ensure that that gets done. Also, while you don't want sensitive data in CloudWatch, this is a much safer place than something like just sending back error bodies in HTTP 5XXs where clients can see them. In theory, CloudWatch access is already a much more limited group than "arbitary clients." Risk is by no means zero, but it seems like appropriate risk to place in the hands of application developers for the benefit.

I ran an experiment, the guts of which were the following function:

newtype Request = Request Text

instance FromJSON Request where
  parseJSON = withObject "Request" $ \o ->
    Request <$> o .: "message"

main :: IO ()
main = do
  hSetBuffering stderr NoBuffering
  let t = T.pack "lol"
  mRuntime $ \(Request t) -> do
    hPutStrLn stderr $ "About to throw: " <> show t
    error $ T.unpack t :: IO ()

(Full reproduction at: https://github.com/bellroy/hal-exceptions-test )

Using the function test interface in the AWS Lambda Console, I received the following JSON response from Lambda:

{
  "errorType": "User",
  "stackTrace": [],
  "errorMessage": "kaboom!\nCallStack (from HasCallStack):\n  error, called at Main.hs:23:5 in main:Main"
}

This matches the data structure you're sending here:

setRequestBodyLBS (encode (LambdaError { errorMessage = e, stackTrace = [], errorType = "User"}))

So that explains where it goes. We just don't ever get to see it, because we're often using API Gateway (REST) APIs which call Lambda on our behalf. API GW's error handling does not appear to propagate or log the error from Lambda, possibly unless you also add $context.integrationErrorMessage to your logging. I haven't tested this with an API Gateway yet, but it could be where the Lambda error comes out of API Gateway.

All that said, I think that for all our sanity, we should probably unconditionally log exceptions just before hal reports them back to Lambda. Otherwise everything except the exception itself appears in the Lambda logs, which is frustrating and confusing. Since Lambda does report exception bodies already, we leak no additional information IMHO.