python-trio/trio

An equivalent of __context__, but for exceptions that got pre-empted by another exception unwinding the stack

Opened this issue · 0 comments

As @Nikratio points out in python-trio/pytest-trio#30, Trio's way of propagating exceptions can lead to obscure results when you have multiple communicating tasks where a crash in one of them triggers a crash in another. Depending on timing and the presence of checkpoints in the first task's cleanup clauses, you can end up in a situation where the second exception propagates up to a common nursery, triggers a cancellation of the original task, and this cancellation wipes out the original exception, leaving you scratching your head over the root cause.

Here's a simplified version of the original example:

import trio, trio.testing

async def echo_server(server_stream):
    try:
        async with server_stream:
            data = await server_stream.receive_some(10)
            await server_stream.send_lal(data)          # <--- notice the typo
    finally:
        # Pretend we had some other cleanup to do
        await trio.hazmat.checkpoint()
        await trio.hazmat.checkpoint()

async def echo_client(client_stream):
    await client_stream.send_all(b"x")
    assert await client_stream.receive_some(1) == b"x"

async def main():
    client_stream, server_stream = trio.testing.lockstep_stream_pair()
    async with trio.open_nursery() as nursery:
        nursery.start_soon(echo_server, server_stream)
        nursery.start_soon(echo_client, client_stream)

trio.run(main)

The exact details might change depending on trio version. For me right now on 0.3.0, putting one checkpoint in the finally block gives me a MultiError([AssertionError, AttributeError]), and putting two checkpoints gives me just a plain AssertionError – the AttributeError has disappeared.

There doesn't seem to be any way to actually preserve the AttributeError here – the echo_server task caught it, and then got cancelled while handling it. In this case it would eventually have propagated out, but in general there's no way to know that. Maybe it was caught for real. In @Nikratio's example, it wasn't going to propagate further, but was going to get logged.

However, taking a page from Python 3's implicit exception chaining, we can at least preserve the information that the AssertionError preempted the AttributeError, so the information is available later when trying to figure out wtf happened. At least in principle.

One possible approach:

  • Implement #285, so that nurseries can peek at the Cancelled exceptions that were used to unwind other branches of the stack

  • if any of these Cancelled exceptions have __context__ values, gather those up

  • attach them to the exception that the nursery re-raises, in a new __preempted__ attribute or similar (it's tempting to wedge this into __context__ instead of making something new, but I don't think we can really do that meaningfully)

  • update our traceback printing code to check for __preempted__, and say something about it

One trick is how to record __preempted__, given that we can have complicated situations like: the same exception passing upwards through multiple nurseries, and preempting some exceptions at each one. Or, a MultiError that pre-empts some other exceptions, but then part of the MultiError gets caught and it converts back into a regular single exception.

Idea: make __preempted__ a dict mapping frames to sets of preempted exceptions – with the idea that the frame records where during the unwinding the preemption took place. When we filter a MultiError, preserve and combine the __preempted__ from MultiError objects that get collapsed. When printing, make a note at the point in the stack where the preemption happened. Maybe the default is that we print a little note like "(at this point, preempted: RuntimeError, ValueError)" and give an envvar that can be set to get full details?

Regarding #285, it might make sense to apply this logic to TooSlowErrors too... maybe that'd just be clutter though, dunno.