Handle possible race condition when using ExecutorEventEmitter
forslund opened this issue · 7 comments
We've had an error report in Mycroft with an exception when removing a once handler.
Been trying to reproduce and I think it's a race condition. We have a sequence which sets up a once
listener waits for 5 seconds for the listener to be called. If the handler is called it returns immediately if the 5 seconds passes without the handler being triggered the listener is removed.
Sometimes (quite rarely) the handler gets called, 5 seconds times out and removes the listener. THEN the once wrapper tries to remove the listener.
A try/except block may be a simple solution to remove the exception log
def _wrapper(f):
def g(*args, **kwargs):
try:
self.remove_listener(event, f)
except KeyError:
# log warning or something
pass # or possibly return
# f may return a coroutine, so we need to return that
# result here so that emit can schedule it
return f(*args, **kwargs)
self._add_event_handler(event, f, g)
A proper solution would be to have a lock around the critical section(s)
Thanks for the report! I'll try to take a closer look this weekend.
Which class is this? Is this the BaseEventEmitter or the ExecutorEventEmitter? read the topic
(it might not matter, it sounds like the move is to make the base EE thread-safe anyhow)
Yeah might be just as well to handle it in the BaseEmitter.
I'll see if I can get some spare time and clobber together a suggestion...
mulling this over, I think it might not be a locking issue - like it sounds like what's happening is:
- you set the once, which adds a handler to remove the listener not after it's fired but after the async handler completes
- the event fires at t=0, kicking off an async handler
- the async action is incomplete at t=1 and a timeout occurs, triggering a manual handler removal
- the async action completes at t=2, triggering an automated handler removal of a handler that's already been removed
so while I could probably add a lock to the BaseEventEmitter around some of the fiddly bits (that's a good idea regardless), I think the answer here is to make the once handlers robust against the handler being removed out-of-band prior to the once's cleanup, and/or have the once remove its own listener prior to the async handler completing. something like that.
Yes that's pretty much what I think is happening as well.
I see two ways of making it more robust,
One: the simple try / except above
Two: Making explicit checks that event
and f
still exist in the event before removing the listener. This will require locking since check + pop cannot be atomic (afaik)
Edit: Since you've got much more insight than me in the architecture here, maybe you have a better idea?
Yeah no, we're of similar minds here. I mulled over making sure the remove runs on the same tick as the trigger, rather than waiting for the callback to complete - I think this is more correct anyway and might ninja it in there, as a digression - but you should still be able to trigger this for a synchronous call.