raquo/Laminar

Errors in recursive graphs can cause stack overflow

Dennis4b opened this issue · 2 comments

(As discussed in Gitter.)

A working example of what we're trying to do: feeding EventBus' events back into itself, which can be useful for timed/repeating effects, and is IMHO a useful and valid approach sometimes.

    def testWorking() = {
        val bus = new EventBus[Int]()
        val curVal = Var(0)

        div(
            child <-- curVal.signal.map(cv => div(s"Value is ${cv}")),
            bus.events.flatMap(v => {
                curVal.set(v)
                EventStream.fromValue(v + 1).delay(1000)
            }) --> bus,
            onMountCallback(_ => bus.emit(1))
        )
    }

The curVal variable is updated every second.

If an error occurs in the bus.events handler (this can for example also be an Ajax request that failed), Laminar throws a too much recursion error in Transaction related code and stops working.

    def testError() = {
        val bus = new EventBus[Int]()
        val curVal = Var(0)

        div(
            child <-- curVal.signal.map(cv => div(s"Value is ${cv}")),
            bus.events.flatMap(v => {
                curVal.set(v)
                EventStream.fromValue(v + 1).delay(1000).map(nv => {
                    if (nv == 5) {
                        throw new Exception("Boom")
                    }
                    nv
                })
            }) --> bus,
            onMountCallback(_ => bus.emit(1))
        )
    }

(Note that I don't really know what I would expect to meaningfully happen in this case, but I guess Laminar's intention would be a more graceful error/catching/mitigation of this error)

raquo commented

Sorry, took me a long time to look into this. Here is the minimal reproduction:

    val owner = new TestableOwner
    val bus = new EventBus[Int]()
    val next = bus.events
        .filter(_ <= 5)
        .map { v =>
          if (v == 3) {
            throw new Exception("Boom")
          }
          v + 1
        }
    next.addObserver(bus.writer)(owner)
    bus.emit(0)

Throwing the exception causes a stack overflow, whereas without it the code completes fine. The stack overflows because the error produced by next is sent to its own source, where it's re-thrown and sent to the source again.

I haven't tested this with a delay(1000), but I expect that it would only serve to have the same stack overflow happen in slow motion.

The current behaviour is problematic because a bug in one self-contained observable graph is allowed to take down the whole Airstream application.

However, just as you, I'm not sure what the correct behaviour should be. I suppose one way around this would be to detect such a loop prior to it reaching the stack limit – essentially, enforce a smaller than usual maximum stack depth for this particular failure mode. For example, we could somehow gracefully stop propagating an error if we've thrown it the last N times. Let's suppose this will work well enough, and won't catch any legitimate use cases, although I'm not sure. This still does not save us from co-recursive errors. Need to think some more.

This is fixed in Airstream 17.x series. Transactions now have a max allowed depth of 1000 which nobody should have any reason to exceed. Infinite transaction loops like this, whether caused by the error channel or not, will be terminated when they reach this new depth limit, the offending transaction will not be executed, and an error will be sent to unhandled errors. The rest of the program will continue.

Note that the happy-path in your code has a delay() operator, and so it is not considered an infinite loop, and is not affected by the limit.

There is still a possibility of an error ping-ponging in between two event buses asynchronously, for example if I add a .delay(1000) to the snippet in my previous comment (the errors get delayed too, just like events), but it won't blow the stack or anything. I'm not super happy about that, but I don't have an approach for a library level solution, and I don't think it's unreasonable to put the burden on the user to be more careful with loops (they can do .ignoreErrors or actually handle the errors).

See discord #news channel for details on 17.x releases.