Async ResultPublishers could deadlock sync experiments
thematthopkins opened this issue · 21 comments
Thanks for the great library!
One issue we ran into is when using an async ResultPublisher for synchronous experiments within an MVC app.
The controller below will deadlock on every request, because of the awaiting on Task.Delay() within the synchronous web request.
This article has a more in-depth explanation on why:
http://blog.stephencleary.com/2012/07/dont-block-on-async-code.html
It looks like ConfigureAwait could potentially be used for a fix, or possibly adding a separate ResultPublisher method that gets invoked on synchronous calls. Our current workaround is to use sync implementations of Publish without any awaits.
using System.Threading.Tasks;
using System.Web.Mvc;
using GitHub;
namespace ScientistDeadlockingResultPublisher.Controllers
{
public class AsyncResultPublisher : IResultPublisher
{
public async Task Publish<T, TClean>(Result<T, TClean> result)
{
await Task.Delay(5000);
}
}
public class DeadlockController : Controller
{
// GET: Deadlock
public ActionResult Index()
{
Scientist.ResultPublisher = new AsyncResultPublisher();
var result = Scientist.Science<int>("myExperiment", e =>
{
e.Use(() => 2);
e.Try("candidate", () => 1);
});
return Content("done");
}
}
It looks like
ConfigureAwait
could potentially be used for a fix
To make sure I understand, you're suggesting that when Scientist calls IResultPublisher.Publish
, it should append a ConfigureAwait(false)
call to that?
Could you try it out in your example here by putting Task.Delay(5000).ConfigureAwait(false);
and make sure that doesn't deadlock?
I tested out a couple approaches. The problem with ConfigureAwait(false)
is that I had to add it on every point up the call stack. Meaning:
`Task.Delay(5000).ConfigureAwait(false);``
But also changing ExperimentInstance
to use
await Scientist.ResultPublisher.Publish(result).ConfigureAwait(false);
As well as doing the same for the top level invocation from Scientist
.
It would also require the implementation of the ResultPublisher
to include it in all descendant async calls.
It doesn't feel like a great solution, since the ResultPublisher
may need to invoke some async libraries that likely don't come pre-wired with ConfigureAwait(false)
.
Another solution I tested was to switch the invocation of the ResultPublisher.Publish
to fire-and-forget as mentioned in the comments:
#pragma warning disable 4014
Task.Run(async () =>
{
await Scientist.ResultPublisher.Publish(result);
});
#pragma warning restore 4014
That actually works beautifully and avoids the deadlock scenario completely. The problems with that approach would be:
- It would break a number of unit tests that verify the
ResultPublisher
call. We'd have to create something like anIResultPublisherRunner
that could be faked out with a non-fire-and-forget implementation. - If anyone's accessing something like a static
HttpContext.Current
in theirResultPublisher
, it would become inaccessible after the switch-over to fire and forget.
A good solution for now that avoids these issues seems to be to use fire-and-forget within our ResultPublisher
itself:
public class FireAndForgetResultPublisher : IResultPublisher
{
public Task Publish<T, TClean>(Result<T, TClean> result)
{
#pragma warning disable 4014
Task.Run(async () =>
{
await Task.Delay(5000);
});
#pragma warning restore 4014
return Task.FromResult((object)null);
}
}
This would still allow us grab info from something like HttpContext.Current
before we enter the fire-and-forget bit.
@thematthopkins I really like the idea of the FireAndForgetResultPublisher
. It should probably be a wrapper publisher that allows you to pass in your real publisher. So something like:
public class FireAndForgetResultPublisher : IResultPublisher
{
IResultPublisher _publisher;
public FireAndForgetResultPublisher(IResultPublisher publisher)
{
_publisher = publisher;
}
public Task Publish<T, TClean>(Result<T, TClean> result)
{
#pragma warning disable 4014
Task.Run(async () =>
{
await _publisher.Publish(result);
});
#pragma warning restore 4014
return Task.FromResult((object)null);
}
}
@thematthopkins interested in contributing this?
@joncloud what I like about @thematthopkins's suggestion is we won't need to change our unit tests. They'll still use the InMemoryResultPublisher
. Instead, we'll just make this FireAndForgetResultPublisher
available to users, but it won't be set by default.
Got it - So the application developer can choose to opt-in for synchronous vs asynchronous publishing.
Got it - So the application developer can choose to opt-in for synchronous vs asynchronous publishing.
Exactly. In the example that @thematthopkins pointed out, the app developer might need to access HttpContext.Current
during publishing. If we always publish async, their result publisher won't have a chance to store that value before publishing.
This approach gives the developer control, while making the async scenario very easy.
Just curious: on the implementation above is there a reason that Task.FromResult((object)null)
is used instead of Task.CompletedTask
?
Just curious: on the implementation above is there a reason that Task.FromResult((object)null) is used instead of
Task.CompletedTask
?
We probably should use that. Is that a new property?
Yeah fairly recently. I just double checked and it was added in .NET 4.6. That'd mean we'd have to change our framework targeting (core would have to go to at least 1.3). Probably not worth the effort. Though if we wanted to we could create the task once and cache it like the actual implementation of Task.CompletedTask
does.
Also @thematthopkins let me know and I can help on this contribution if necessary.
@joncloud: I looked through your implementation and it uncovered an issue with the FireAndForgetResultPublisher
approach. It's difficult to ensure that the registered exception handler is called. Your implementation handles this nicely, where the FireAndForgetResultPublisher
would require IResultPublisher
be modified to take an Action<Exception> onThrown
.
public Task Publish<T, TClean>(Result<T, TClean> result, Action<Exception> onThrown)
{
Task task = _innerResultPublisher.Publish(result);
#pragma warning disable 4014
task.ContinueWith(_ =>
{
if (task.IsFaulted)
{
onThrown(task.Exception.GetBaseException());
}
});
#pragma warning restore 4014
return Task.FromResult((object)null);
}
Making the IResultPublisher
responsible for its own exception handling doesn't seem right.
With this in mind, I think it's probably best to go with your approach. If context needs to be added from something like HttpContext.Current
, it could be done manually within the experiments, or there could be something like a static ContextProvider
. I could put together the pull request for an ContextProvider
if that seems like a good direction.
It's difficult to ensure that the registered exception handler is called
Why is that?
I could put together the pull request for an ContextProvider if that seems like a good direction.
Let's avoid adding anything until someone has a real need for it.
In the current implementation, the ResultPublisher gets invoked from ExperimentInstance like:
try
{
await Scientist.ResultPublisher.Publish(result);
}
catch (Exception ex)
{
Thrown(Operation.Publish, ex);
}
If this invokes an ExceptionThrowingFireAndForgetResultPublisher:
public class ExceptionThrowingFireAndForgetResultPublisher : IResultPublisher
{
public Task Publish<T, TClean>(Result<T, TClean> result)
{
#pragma warning disable 4014
Task.Run(async () =>
{
await Task.Delay(5000);
throw new Exception();
});
#pragma warning restore 4014
return Task.FromResult((object)null);
}
}
Then the Exception never gets caught by the ExperimentInstance, and the registered exception handler is never called (I ran this to confirm that the exception is not caught). @joncloud's solution handles this case correctly by calling the registered exception handler from within the async block.
Oh I see. That leads to a potential different problem though. Right now, we can expect that the Thrown
callback is run on the current thread. If someone uses the FireAndForgetPublisher
, it could mean Thrown
is called on a different thread.
That might not matter. But in some cases it might.
Another option we could do is not change the Publish
signature, but instead pass in the Thrown
callback to the constructor of the FireAndForgetResultPublisher
. Wouldn't that also solve this problem?
Just to be clear are you proposing something like:
Scientist.ResultPublisher = new FireAndForgetResultPublisher(..., ExceptionMonitor.Observe);
Scientist.Science("...", experiment => {
experiment.Thrown(ExceptionMonitor.Observe);
});
class ExceptionMonitor {
public static void Observe(Operation operation, Exception exception);
}
The problem with putting it in the constructor is that the FireAndForgetResultPublisher
would be static and reused across experiments. Thrown
is set per-experiment, so would have to be passed in per Publish
call.
Making a ResultPublisher per-experiment would allow us to move it to the constructor. Making ResultPublisher per-experiment would also be nice because each experiment would then be self-contained and not have to depend on separate setup of the static ResultPublisher.
Another potential solution may be to remove support for Operation.Publish
from the enum, and say Publishers are responsible for handling their own errors. That would also get rid of the ambiguity about the thread Thrown
gets run on.
If i'm reading the ruby implementation correctly it looks like the actual publishing happens in the experiment instance itself, so that would align the closer if the result publisher wasn't a singleton.
In regards to Operation.Publish
I think you're right. It could definitely provide focus about where exception handling should take place. The question though is how closely should the APIs mimic the ruby implementation. .NET definitely is different because of our support for async/await, and the Task
based result publishing implementation.
I don't think we need to adhere strictly to the Ruby implementation given we have the benefit of tasks and they don't.
However, I have a thought. When you consider the "Forget" part of FireAndForget
, it would be fair to interpret that as even exceptions are forgotten. And that the exception handling around the Publish
method in this case means that it'll catch exceptions when initiating the fire and forget, but after that, nothing.
In other words, I'm fine that the FireAndForgetPublisher
as suggested would not marshal exceptions to the Thrown
handler.
However, some users may still want to log exceptions in the publisher so they know that experiments are going well or not. We could do that via a basic onThrown
constructor argument.
Right now, I do want to keep the publisher as a singleton because it's unlikely that people would want per experiment publishers and having to set one up each time you set up an experiment seems like unnecessary work.
Thoughts?
That makes a lot of sense. Going back to your original statement I think approaching the simplest solution would be very appropriate. If there are additional needs, then we can always revisit and provide more functionality.
Yeah, I like the idea a lot as well. Added a pull request that borrows heavily from @joncloud's implementation for the unit tests.
When onPublisherException
is not supplied, the exception thrown from the resultpublisher left unhandled. Couldn't figure out a way to unit test that bit though, since the exception would be thrown from wherever the fire-and-forget thread is running, and hard to catch.