Handle possible race condition when using ExecutorEventEmitter

Question

Handle possible race condition when using ExecutorEventEmitter

forslund opened this issue 4 years ago · 7 comments

We've had an error report in Mycroft with an exception when removing a once handler.

Been trying to reproduce and I think it's a race condition. We have a sequence which sets up a once listener waits for 5 seconds for the listener to be called. If the handler is called it returns immediately if the 5 seconds passes without the handler being triggered the listener is removed.

Sometimes (quite rarely) the handler gets called, 5 seconds times out and removes the listener. THEN the once wrapper tries to remove the listener.

A try/except block may be a simple solution to remove the exception log

def _wrapper(f):
            def g(*args, **kwargs):
                try:
                    self.remove_listener(event, f)
                except KeyError:
                    # log warning or something
                    pass  # or possibly return

                # f may return a coroutine, so we need to return that
                # result here so that emit can schedule it
                return f(*args, **kwargs)

            self._add_event_handler(event, f, g)

A proper solution would be to have a lock around the critical section(s)

Answer 1 · 2020-10-06T17:08:20.000Z

Thanks for the report! I'll try to take a closer look this weekend.

Answer 2 · 2020-10-06T17:08:54.000Z

~~Which class is this? Is this the BaseEventEmitter or the ExecutorEventEmitter?~~ read the topic

Answer 3 · 2020-10-06T17:09:22.000Z

(it might not matter, it sounds like the move is to make the base EE thread-safe anyhow)

Answer 4 · 2020-10-06T18:08:43.000Z

Yeah might be just as well to handle it in the BaseEmitter.

I'll see if I can get some spare time and clobber together a suggestion...

Answer 5 · 2020-10-06T19:06:07.000Z

mulling this over, I think it might not be a locking issue - like it sounds like what's happening is:

you set the once, which adds a handler to remove the listener not after it's fired but after the async handler completes
the event fires at t=0, kicking off an async handler
the async action is incomplete at t=1 and a timeout occurs, triggering a manual handler removal
the async action completes at t=2, triggering an automated handler removal of a handler that's already been removed

so while I could probably add a lock to the BaseEventEmitter around some of the fiddly bits (that's a good idea regardless), I think the answer here is to make the once handlers robust against the handler being removed out-of-band prior to the once's cleanup, and/or have the once remove its own listener prior to the async handler completing. something like that.

Answer 6 · 2020-10-06T19:16:18.000Z

Yes that's pretty much what I think is happening as well.

I see two ways of making it more robust,

One: the simple try / except above
Two: Making explicit checks that event and f still exist in the event before removing the listener. This will require locking since check + pop cannot be atomic (afaik)

Edit: Since you've got much more insight than me in the architecture here, maybe you have a better idea?

Answer 7 · 2020-10-06T19:32:21.000Z

Yeah no, we're of similar minds here. I mulled over making sure the remove runs on the same tick as the trigger, rather than waiting for the callback to complete - I think this is more correct anyway and might ninja it in there, as a digression - but you should still be able to trigger this for a synchronous call.