viproma/coral

Race condition in coral::async::CommThread

kyllingstad opened this issue · 1 comments

The race condition occurs if an exception is thrown by a task function, and can be understood by looking at coral::async::detail::CommThreadMessagingLoop() and its caller, coral::async::detail::CommThreadBackground(). There is a promise stored inside each task function which is used to report the result of that task. There is also a promise stored on the stack of CommThreadBackground(), called statusNotifier, which is used to notify the foreground thread that the background thread has terminated. In the event of an exception, the task promise gets resolved before the status notifier (because myTask goes out of scope), and between these two events there is a short time in which the foreground thread can attempt to submit a new task and where CommThread::Execute() will not notice that anything is wrong.

This is causing sporadic failures in the coral_async tests, with error messages looking something like this:

1: [----------] 3 tests from coral_async
1: [ RUN      ] coral_async.CommThread
1: [       OK ] coral_async.CommThread (1306 ms)
1: [ RUN      ] coral_async.CommThread_void
1: /home/travis/build/viproma/coral/src/lib/async_test.cpp:264: Failure
1: Failed
1: /home/travis/build/viproma/coral/src/lib/async_test.cpp:272: Failure
1: Value of: thread.Active()
1:   Actual: true
1: Expected: false
1: [  FAILED  ] coral_async.CommThread_void (1305 ms)
1: [ RUN      ] coral_async.CommThread_BadData
1: [       OK ] coral_async.CommThread_BadData (101 ms)
1: [----------] 3 tests from coral_async (2712 ms total)

Correction: The promise does not get destroyed because the myTask object goes out of scope. When the task function gets called, it receives the promise by value as an argument. This means that the promise actually gets destroyed when the function returns or throws. In other words, the promise is already destroyed when myTask(r, ...); has returned.