nats-io/nats.net.v2

Avoid multicast delegates (event) in favor of Func<args, Task> to allow for async event handlers

Closed this issue ยท 20 comments

mnmr commented

Proposed change

Replace usages of the event keyword with an option to pass a Func<args, Task> (i.e. passing a callback instead of += delegate).

Use case

If you want to run async code in event handlers you are forced to use either sync-async, which is known to cause deadlocks if you're not extremely knowledgable about what is going on, or put the event into a queue for processing by a separate task.

For instance, in the ConnectionDisconnected event handler, users may want to call ConnectAsync to reconnect.

Contribution

I'll be happy to make this change and submit a PR.

The only thing I'm not sure how to solve is how to allow folks to unsubscribe (the implementation of "event" has special magic sauce that does the right thing even if you pass in a new delegate, so perhaps the solution is findable). If there is no easy solution then the code will likely need to return an "event handler id" that can be used to unsubscribe.

mtmk commented

The += and -= works with Action properties I wonder if that's true for Funcs?

mtmk commented

Another option may be mimicking CancellationToken.Register() and spit out some disposable registration object?

The += and -= works with Action properties I wonder if that's true for Funcs?

It does, but the onus is on the caller to assign a local function or regular method so that it can be removed with the same reference. Adding a bare lambda means you lose the reference, but in 99 cases out of 100, people don't remove the handlers anyway and let shutdown clean it all up.

edit: this is the same issue with Action of course

mnmr commented

Perhaps it is worth considering if there are any events that benefit from supporting multiple handlers?

If we can limit things to a single handler, then unsubscribing becomes a trivial issue.

mtmk commented

We also have a pattern of passing callbacks as read-only Funcs in options. We could move all the events to NatsOpts as nullable Func<Task> properties. Since they'd be read-only we shouldn't have add/remove issues.

@mtmk Might be better to use https://learn.microsoft.com/en-us/dotnet/api/microsoft.visualstudio.threading.asynceventhandler-1?view=visualstudiosdk-2022. Funcs have the limitation that only one can be attached where handlers can have multiples (this depends on implementation of the Func assignment of course).

Also, making it part of opts makes it useless for injected systems as the objects would be initialized prior to injection.

mtmk commented

@mtmk Might be better to use https://learn.microsoft.com/en-us/dotnet/api/microsoft.visualstudio.threading.asynceventhandler-1?view=visualstudiosdk-2022. Funcs have the limitation that only one can be attached where handlers can have multiples (this depends on implementation of the Func assignment of course).

@sspates I guess we can just swap out EventHandler and we'd be done. Is the code for AsyncEventHandler available in a form we can use in our codebase? Or we should create one ourselves?

mtmk commented

It might be better though to just use the vs threading library directly.

@sspates I don't think we should create a dependency for this. We can just implement the bits we need.

@mtmk Looks like you need to invoke the handler in the Nats.Net v2 code using
Delegate.GetInvocationList() and then await with Task.WhenAll() otherwise it will only await the first hander.

mnmr commented

I expect to get started on this tomorrow.

mnmr commented

I've hit a few problems with this that I'm not sure how best to resolve:

  1. OnError is invoked from an Action passed when creating the bounded channel. We'd either have to wait synchronously for the error handlers to complete or spawn a fire-and-forget task to do the error reporting. Alternatively, we can keep OnError as a normal/sync event.
  2. ConnectionOpened is called inside a lock statement, so also cannot be awaited without changing the locking mechanism or moving the code outside the lock. The code is spawning a task to run ReconnectLoop, so we could move the event signaling to there (requires passing the url and possibly a boolean to determine whether ReconnectLoop should call the handlers).

Thoughts?

mtmk commented
  1. OnError is invoked from an Action passed when creating the bounded channel. We'd either have to wait synchronously for the error handlers to complete or spawn a fire-and-forget task to do the error reporting. Alternatively, we can keep OnError as a normal/sync event.

or maybe a new channel for error events?

  1. ConnectionOpened is called inside a lock statement, so also cannot be awaited without changing the locking mechanism or moving the code outside the lock. The code is spawning a task to run ReconnectLoop, so we could move the event signaling to there (requires passing the url and possibly a boolean to determine whether ReconnectLoop should call the handlers).

good point. reconnect loop might be a better place to do it. I don't think we get connection open events on reconnects at the moment. edit: looks like we should get the open events. I thought I wasn't getting them in one of my tests, but it might be me.

mnmr commented
  1. OnError is invoked from an Action passed when creating the bounded channel. We'd either have to wait synchronously for the error handlers to complete or spawn a fire-and-forget task to do the error reporting. Alternatively, we can keep OnError as a normal/sync event.

or maybe a new channel for error events?

I'm not sure I follow - the problem here is that the code from System.Threading.Channels doesn't allow for async, so creating a secondary channel wouldn't really help with that.

  1. ConnectionOpened is called inside a lock statement, so also cannot be awaited without changing the locking mechanism or moving the code outside the lock. The code is spawning a task to run ReconnectLoop, so we could move the event signaling to there (requires passing the url and possibly a boolean to determine whether ReconnectLoop should call the handlers).

good point. reconnect loop might be a better place to do it. I don't think we get connection open events on reconnects at the moment. edit: looks like we should get the open events. I thought I wasn't getting them in one of my tests, but it might be me.

One caveat with moving the call either outside the lock or to ReconnectLoop is that, well, it executes outside of the lock in either case. Which is likely fine for event handlers, but someone more knowledgable about the internals should probably say OK first :)

mtmk commented
  1. OnError is invoked from an Action passed when creating the bounded channel. We'd either have to wait synchronously for the error handlers to complete or spawn a fire-and-forget task to do the error reporting. Alternatively, we can keep OnError as a normal/sync event.

or maybe a new channel for error events?

I'm not sure I follow - the problem here is that the code from System.Threading.Channels doesn't allow for async, so creating a secondary channel wouldn't really help with that.

You TryWwrite() to an unbounded channel from within the Action and have a task started with the connection, reading the channel in an async delegate where you can invoke the event handlers with an await. Have a look at ReadLoopAsync / WriteLoopAsync for examples. Also the reason I'm suggesting this approach is also because there maybe other places we want to push error messages to the event handlers.

  1. ConnectionOpened is called inside a lock statement, so also cannot be awaited without changing the locking mechanism or moving the code outside the lock. The code is spawning a task to run ReconnectLoop, so we could move the event signaling to there (requires passing the url and possibly a boolean to determine whether ReconnectLoop should call the handlers).

good point. reconnect loop might be a better place to do it. I don't think we get connection open events on reconnects at the moment. edit: looks like we should get the open events. I thought I wasn't getting them in one of my tests, but it might be me.

One caveat with moving the call either outside the lock or to ReconnectLoop is that, well, it executes outside of the lock in either case. Which is likely fine for event handlers, but someone more knowledgable about the internals should probably say OK first :)

Sorry to disappoint but I might be your best bet :) We inherited this from AlterNats here is the original commit Cysharp/AlterNats@87bbff3

I personally can't think of any reason why it should stay in the lock for internals but I don't know there maybe an edge case for an application although I doubt it. I suggest we move it outside and make it clear in release notes that it might behave differently. Feel free to suggest other options.

mtmk commented

@sspates @mnmr fyi just merged #297 with a few failing tests which are related to the interface-class member mismatch. I assume they are because of the missing event handlers. they should be fixed after this issues is resolved.

mtmk commented

btw @mnmr feel free to implement only one event handler to establish the pattern. I happy to do the leg work of applying it to others in a follow-up PR. Just in case you don't have enough time on your hands ๐Ÿ˜…

mnmr commented

#324 is ready for review :)

PS: After resolving conflicts with commits made to main since I forked, I have an unused "using System.IO.Pipelines;" and a call to "SubscriptionManager.WriteReconnectCommandsAsync" that Rider complains about. As I'm not sure where these errors originate, I don't plan any updates to the PR to address these issues.

mnmr commented

btw @mnmr feel free to implement only one event handler to establish the pattern. I happy to do the leg work of applying it to others in a follow-up PR. Just in case you don't have enough time on your hands ๐Ÿ˜…

I never have enough time, I'm literally the only back-end developer for our rather huge solution.. but I get to set my own priorities, which allows for a bit of work on the side every so often :)

mtmk commented

I have an unused "using System.IO.Pipelines;" and a call to "SubscriptionManager.WriteReconnectCommandsAsync" that Rider complains about. As I'm not sure where these errors originate, I don't plan any updates to the PR to address these issues.

As long as solution build with no warnings we're good.

I never have enough time, I'm literally the only back-end developer for our rather huge solution.. but I get to set my own priorities, which allows for a bit of work on the side every so often :)

We absolutely do appreciate it! Thank you @mnmr ๐Ÿ™ โค๏ธ