Randomly occurring NullReferenceException
antonpopov opened this issue ยท 7 comments
Hi! All of our webservices use ProblemDetails middleware. However, going through the logs we found out that in some cases (but not always) when an exception occurs during the execution of the application another exception occurs in the ProblemDetails middleware. Here's the stack trace:
An exception was thrown attempting to execute the problem details middleware.
System.NullReferenceException: Object reference not set to an instance of an object.
at Microsoft.AspNetCore.Mvc.Infrastructure.DefaultOutputFormatterSelector.ValidateContentTypes(MediaTypeCollection contentTypes)
at Microsoft.AspNetCore.Mvc.Infrastructure.DefaultOutputFormatterSelector.SelectFormatter(OutputFormatterCanWriteContext context, IList`1 formatters, MediaTypeCollection contentTypes)
at Microsoft.AspNetCore.Mvc.Infrastructure.ObjectResultExecutor.ExecuteAsyncCore(ActionContext context, ObjectResult result, Type objectType, Object value)
at Hellang.Middleware.ProblemDetails.ProblemDetailsMiddleware.WriteProblemDetails(HttpContext context, ProblemDetails details)
at Hellang.Middleware.ProblemDetails.ProblemDetailsMiddleware.HandleException(HttpContext context, ExceptionDispatchInfo edi)
What's really odd is that we couldn't find a pattern. We couldn't reproduce the issue by executing some of the failing requests in debug or when the app is deployed. All we can see is that sometimes the beforementioned NullReference exception is thrown.
Any help or guidance will be much appreciated!
Target runtime: NET6.0
Hellang.Middleware.ProblemDetails version: 6.4.0
Hmm. That's a really weird one. Looks like it's happening deep in MVC code. Forunately, the ValidateContentTypes
only has a single call site:
It has a precondition which asserts that contentTypes
is not null
, which leaves very little to be null
in this method:
๐ค
And the code for ValidateContentTypes
hasn't been touched since 2017. Could it be some race condition cause by HttpContext
being used outside a request scope or something?
Hmm, could it be that the requests have a wildcard Accept header and it's the exception message that fails?
It tries to get the FormatObjectResult_MatchAllContentType
resource for the exception message, but I can't find any reference to that resource in the repo. However, I found this MatchAllContentTypeIsNotAllowed
resource, which I can't see used anywhere in code. I wonder if that's the issue?
In any case, this is probably an issue for dotnet/aspnetcore ๐
Nope, that's not it. When generating methods for strings containing formatting arguments for the Resources
class, the generator adds a Format
prefix, which explains why I didn't find any references to it. There's indeed an ObjectResult_MatchAllContentType
resource ๐
And the code for
ValidateContentTypes
hasn't been touched since 2017. Could it be some race condition cause byHttpContext
being used outside a request scope or something?
I think that's not the case. However we are still investigating and trying to reproduce the issue. Thanks for the time.
We experienced the same problem but we figured out it was caused by being on an old version of ProblemDetails. What worries me is that the OP reports using a version that, according to my investigation, should have this problem fixed. We haven't had time to thoroughly check the issue no longer occurs but early results look promising.
Maybe some information about the problem will be helpful:
The NullReferenceException
happens if the ContentTypes
property contains a null element. This causes DefaultOutputFormatterSelector.ValidateContentTypes
(source) to throw when calling the constructor of MediaType
, as it attempts to access the Length
property of the null string (source). Which leads to the question, how does an null element end up in ContentTypes
?
ProblemDetails 5.4.0 provided the Executor
with a reference to Options.ContentTypes
(source). This was problematic, as this is effectively a singleton object, reused for every request, but is implemented as a thread-unsafe List<>
. ObjectResultExecutor
modifies this collection by adding two elements to it in InferContentTypes
(source). Growing the singleton collection by two elements every time the middleware formats a problem response is bad enough but the real problem is the risk of a race condition. If two threads execute InferContentTypes
concurrently, then the _size
of the list is incremented twice but both threads may end up assigning a new element to the same index of the array - leaving one space in the array unassigned. Hence a null value in ContentTypes
.
Once a null element ends up in ContentTypes
, it stays there until the process is restarted, and until then the exception occurs every time. If the service is load-balanced, then only some instances may be affected, giving an impression of randomness.
But this appears to have been fixed by ProblemDetails v5.4.1: the middleware no longer passes the original singleton collection to the Executor but it passes its clone instead.
Therefore I'm interested to know, is the OP definitely on 6.4.0 and not 5.4.0? Is it possible the race condition on ContentType
is caused elsewhere in their application?
I will post again on how we fare with the newer version of the NuGet.