Azure/azure-functions-dotnet-worker

Fatal exceptions cause function to hang until timeout

MathiasAugustesen opened this issue · 0 comments

Description

Summary
Fatal exceptions in Azure Function causes the request to hang until timeout. It seems to be billed as 5 minutes of execution, meaning that what I believe to be a bug in Azure Functions may result in accidental bills of thousands of dollars for a job with failing requests

Versions
.NET version: .NET 9.0.0
.NET Azure Function packages versions: 2.0.0

Description
While experimenting with throwing different exceptions in user code, some exceptions caused the HTTP request to hang until timeout (5 minutes on Consumption plan). Further experiments suggest the exceptions are the ones where IsFatal returns true.

The error can be reproduced locally on my MacBook with M1 MAX CPU (model number: MK1A3DK/A)
I happen to currently have access to a .NET 8 function app 1.22.0 of Microsoft.Azure.Functions.Worker where throwing fatal exceptions does not cause the request to hang, but instead a 500 response immediately.

Expected
I would expect the request to return an error immediately. I am only billed for the execution time of the request (and maybe a little overhead)

Timeline of actual behavior
With the help of some app insights telemetry, I can see that the following happens in order:

  1. Function is triggered through HTTP
  2. Execution starts (Executing 'Functions.[FUNCTION_NAME]' (Reason='This function was programmatically called via the host APIs.')
  3. Unhandled fatal exception in user code bubbles up to the HttpTrigger method and beyond (is logged immediately in failures page in app insights). POST /api/[FUNCTION_NAME] returns 200, InProc Invoke is successful
  4. Request hangs
  5. Request is cancelled after timeout period, or never when testing locally

Extra comments
Fatal exceptions were previously handled in the InvocationHandler. This was changed in #2789

I suspect that the timeout will be billed as 5 minutes of execution. If that is the case, a job with lots of requests that result in fatal exceptions can run a bill up into thousands of dollars or more in a single day or night from the execution cost. If an out of memory exception occurs in the worker process I don't see how I can even wrap my function in a try catch to mitigate it. If there are any temporary solutions I am very interested in hearing those. It is cost-wise currently very dangerous for me to start using the function at scale.

Steps to reproduce

I've created a sample repository with a minimal example that reproduces the problem. You should be able to git clone, cd into it and run func start from there. Then call the function