danvratil/qcoro

Coroutine cancellation

Opened this issue · 10 comments

It is desirable that coroutines can be cancelled from the outside. So far I came up with four possible solutions, each with their own set of advantages and drawbacks.

1. Using a stop token

Inside a coroutine, user would obtain a stop token (QCoro::thisCoroutine::stopToken()) for the current coroutine. The coroutine caller could cancel the coroutine through the Task<> object (calling task.cancel()).

➕ Gives the programmer more control over where to terminate the cancelled coroutine, allowing for additional cleanup of resources
➖ The programmer must manually check the stop token after each suspension point - failing so makes the coroutine "uncancellable"
➖ If a coroutine is cancelled because it would never be resumed (e.g. a coroutine awaiting a network message gets cancelled because the network is down, so the message will never arrive and the coroutine will never resume), the cancellation will have no real effect as the coroutine will remain suspended, never reaching the code that checks the stop token. I would consider this a memory leak.
➖ Non-void coroutine must return something, which may make the interface more complex.

QCoro::Task<> myCoro() {
    ...
    const auto result = co_await otherCoro();
    if (QCoro::thisCoroutine::stopToken().stop_requested()) {
        ...
        co_return;
    }
    ...
}

2. Using stop token and artificially resuming the cancelled coroutine

This works the same as the case above, but it tries to solve the last ➖ by artificially resuming the suspended coroutine after it's been cancelled. This allows the coroutine to reach the code that checks the stop token and terminate.

➕ Coroutine termination is always guaranteed (unless the programmer makes a mistake and forgets to handle cancellation)
➖ If the cancelled coroutine is awaiting a result of another awaitable, artifically resuming it requires that we also provide some empty result (so all awaitables would have to have std::optional<T> or std::variant<T, QCoro::Cancelled> return type). This would make the API annoying to use, especially if T would be something like std::expect<U, E>, requiring the caller to first check T or Cancelled_t and if T then check U or E.
➖ Same as with the previous case, if the coroutine is non-void, it must return something.

QCoro::Task<> myCoro() {
    ...
    const auto result = co_await otherCoro();
    if (std::holds_alternative<QCoro::Cancelled>(result)) {
        ...
        co_return;
    }
    ...
}

3. Destroying the coroutine

When user cancels a coroutine, we could simply destroy the coroutine (std::coroutine_handle::destroy()) which takes care of destroying the coroutine's stack, including properly invoking destructors. The major danger of this approach is that it's easy to leak manually aquired resources if the resource lifetime crosses the suspension point.

➕ Always works
➕ Can be implemented completely inside Task<>
➖ Programmer loses control over the cancellation
➖ No chance for manual cleanup before terminating the coroutine can lead to memory leaks

❗Destroying the coroutine state does not automatically cascade to nested coroutines, and it may be tricky/impossible to implement. At best we would need to track currently awaited coroutines inside the cancelled coroutine's promise.

4. Throwing an exception from suspension point

When a suspended coroutine is canceled, it would be resumed, and await_resume() at the coroutine's suspension point would throw an exception (e.g. QCoro::CoroutineCanceled)

➕ Always works, as long as the awaitable that the coroutine is awaiting supports it
➕ If the programmer needs to handle cancellation in a special way, they can catch the exception, perform any cleanup, and then rethrow the exception to continue with cancelation.
➕ Can fallback to 3) if exceptions are disabled
➖ Requires support in custom awaitables
➖ Exceptions are non-standard pattern in the Qt ecosystem
➖ Requires exception support enabled

QCoro::Task<> myCoro() {
    ...
    try {
        const auto r = co_await otherCoro();
    } except (const QCoro::Cancelled &cancel) {
        ...
        throw cancel;
    }
    ...
}

I'm currently most in favor of 3, since it's the easiest to implement and has a minimal impact on user code. We could always scale up from there. I have yet to do some PoC implementation to verify whether it is workable.

With the pros and cons as presented here, I think 3 is probably the best option.

Experimental implementation (including basic cancellation cascading) is implemented in feature/cancellation branch.

The cascading is needed when the object being co_awaited by the cancelled coroutine is a QCoro::Task<T>. The clever trick here is that we can intercept the QCoro::Task<T> in the current coroutine promise's Promise::await_transform() overload for QCoro::Task<T> just before the coroutine is suspended. When the coroutine is canceled, we first check if there's an intercepted QCoro::Task and if it is, we cancel it first, then destroy the current coroutine. This will lead to cascade of cancellations.

This is not needed for types that directly implement the Awaitable interface - those should be living on the current coroutine's stack and thus will be destroy when the cancelled coroutine's state is destroyed.

Newly discovered issue:

If I hold a reference to a Task and use it to cancel the task, how does this affect .then() continuations someone else may have attached to the Task elsewhere?

Hmm this is a good question. I guess they would never get called, but maybe there could be an .when_canceled() or on_canceled() continuation as well?

Not good enough, imagine case like this:

QCoro::Task<T> task = startAnAsyncOperation();
// Schedule cancellation of the operation if it doesn't complete within 100ms
QTimer::singleShot(100ms, [&task]() { task.cancel(); });
const int r = co_await task.then([]() { 
   co_await something();
   co_return 42;
});

If the task takes too long and gets cancelled, the then() continuation doesn't get invoked, but then how should the co_await behave? It must return something...

I'm starting to think we may have to resort to exceptions after all, as there seems to be no other way to propagate the cancellation in all directions. With the approach 3) it looks like all I can do is to cascade a single caller-callee chain of coroutines, but any more complex scenario becomes harder and harder...

Alternative option would be to make cancellation an explicit feature:

Introduce CancellableTask<T> which resolves to Result<T> where Result is a wrapper that either has the value T or indicates cancellation.

It would be possible to only co_await a CancellableTask inside of CancellableTask, awaiting a Task inside a CancellableTask would be a compilation error. It would be OK to await a CancellableTask inside a Task, though. This would make the problem described in the comment above go away, since the then() continuation would return a CancellableTask as well, so the co_await would return a cancelled Result<T>.

The constraint of not being able to co_await a Task inside of a CancellableTask is fairly limiting, though - if there's a 3rd-party library whose API has non-cancellable Tasks, it would be unusable from any cancellable code (which might be coming from another 3rd party library). Possible solution would be to introduce Awaitable<T> QCoro::detachCoroutine(Task<T> &&t) to force the programmer to consciously and explicitly say "if the current coroutine gets cancelled, this coroutine t will not be cancelled".

And obviously you would only be able to use || with CancellableTasks or detached coroutines.

I see now, thanks for the explanation.

I think your proposal for CancellableTask<> is reasonable, although it is quite limiting. I think for what you're getting with it though, it offsets the cost of using it.

To provide some update (after a longer pause), I have working CancellableTask<> implementation, but I'm struggling with some memory issues in the DetachedTask. I will probably clean up the current CancellableTask<> code and merge it without the DetchedTask<> support, and introduce that later.

Another issue that I haven't realized before is what to do with the existing QCoro API - all the wrappers in QCoro return the uncancellable Task<>. While most can definitely be canceled, changing the signature of all the wrappers would be an API incompatible change. I still need to figure out how to handle this....

While most can definitely be canceled, changing the signature of all the wrappers would be an API incompatible change. I still need to figure out how to handle this....

The current version of qcoro being 0.6 and not 1.x, an incompatible API change is perfectly acceptable, IMHO (rather than a complicated solution).

I think it would also be nice to have an ability to attach CancellableTask to QObject, so that it would be automatically cancelled task when QObject is destroyed (similarly to how signal/slot connections are automatically cancelled when sender or receiver QObjects are destroyed).