jfhbrook/pyee

Handle possible race condition when using ExecutorEventEmitter

forslund opened this issue · 7 comments

We've had an error report in Mycroft with an exception when removing a once handler.

Been trying to reproduce and I think it's a race condition. We have a sequence which sets up a once listener waits for 5 seconds for the listener to be called. If the handler is called it returns immediately if the 5 seconds passes without the handler being triggered the listener is removed.

Sometimes (quite rarely) the handler gets called, 5 seconds times out and removes the listener. THEN the once wrapper tries to remove the listener.

A try/except block may be a simple solution to remove the exception log

def _wrapper(f):
            def g(*args, **kwargs):
                try:
                    self.remove_listener(event, f)
                except KeyError:
                    # log warning or something
                    pass  # or possibly return

                # f may return a coroutine, so we need to return that
                # result here so that emit can schedule it
                return f(*args, **kwargs)

            self._add_event_handler(event, f, g)

A proper solution would be to have a lock around the critical section(s)

Thanks for the report! I'll try to take a closer look this weekend.

Which class is this? Is this the BaseEventEmitter or the ExecutorEventEmitter? read the topic

(it might not matter, it sounds like the move is to make the base EE thread-safe anyhow)

Yeah might be just as well to handle it in the BaseEmitter.

I'll see if I can get some spare time and clobber together a suggestion...

mulling this over, I think it might not be a locking issue - like it sounds like what's happening is:

  • you set the once, which adds a handler to remove the listener not after it's fired but after the async handler completes
  • the event fires at t=0, kicking off an async handler
  • the async action is incomplete at t=1 and a timeout occurs, triggering a manual handler removal
  • the async action completes at t=2, triggering an automated handler removal of a handler that's already been removed

so while I could probably add a lock to the BaseEventEmitter around some of the fiddly bits (that's a good idea regardless), I think the answer here is to make the once handlers robust against the handler being removed out-of-band prior to the once's cleanup, and/or have the once remove its own listener prior to the async handler completing. something like that.

Yes that's pretty much what I think is happening as well.

I see two ways of making it more robust,

One: the simple try / except above
Two: Making explicit checks that event and f still exist in the event before removing the listener. This will require locking since check + pop cannot be atomic (afaik)

Edit: Since you've got much more insight than me in the architecture here, maybe you have a better idea?

Yeah no, we're of similar minds here. I mulled over making sure the remove runs on the same tick as the trigger, rather than waiting for the callback to complete - I think this is more correct anyway and might ninja it in there, as a digression - but you should still be able to trigger this for a synchronous call.