adamhaile/S

Handling Nested Computations under Branching Logic

ryansolid opened this issue · 13 comments

I hit an interesting problem that I'm gathering I'm just attacking wrong but I wanted to see what you think. This goes back to issues around trying to do conditionals with boolean statements that can run too many times without changing their results. Obviously breaking synchronicity is another option. But as you may remember I'd been creating Roots to handle disposal. I came across an issue there recently that I hadn't noticed before, I'm gathering due to manual disposal getting queued up (ie it's frozen like the rest).

Here is the simplest reproduction I have. You can understand why you wouldn't expect data to be re-evaluated again after the ternary evaluates the other way. Thoughts?

https://codesandbox.io/s/objective-tdd-dif57

The issue is when the condition turned from true to false and value is set to undefined it still evaluates the nested child and fails to find a value with length.

EDIT
I was wrong this even happens when you break synchronicity:

import S from "s-js";

S.root(() => {
  const trigger = S.data(false),
    data = S.data(null),
    cache = S.value(S.sample(trigger)),
    child = data => {
      S(() => console.log("nested", data().length));
      return "Hi";
    };
  S(() => cache(trigger()));
  const memo = S(() => (cache() ? child(data) : undefined));
  S(() => console.log("view", memo()));
  S.freeze(() => {
    trigger(true);
    data("name");
  });
  S.freeze(() => {
    trigger(false);
    data(undefined);
  });
});

This produces the same error.

EDIT 2
https://codesandbox.io/s/vigorous-cherry-vo79c shows an even simpler re-production using only a single condition and no freezing.

import S from "s-js";

S.root(() => {
  const data = S.data(null),
    cache = S.value(S.sample(() => !!data())),
    child = data => {
      S(() => console.log("nested", data().length));
      return "Hi";
    };
  S(() => cache(!!data()));
  const memo = S(() => (cache() ? child(data) : undefined));
  S(() => console.log("view", memo()));
  console.log("ON");
  data("name");
  console.log("OFF");
  data(undefined);
});

Is this just by design and we should expect these states to exist at the same time?

I've tested this now using the track method as described in #26. And using subclocks as they both seem to address this. They do not rerun the title length child computation on re-evaluation when the upstream computation does not follow that branch. Although with the track method the order of setters matters in the freeze method whereas subclocks cleanly handles the situation regardless of the order (as you'd expect).

Is something like subclocks the only way to cleanly handle this case? I have been attempting to update the subclocks implementation with the new features in the main implementation (exposing nodes, and node recycling) but I haven't quite got it working properly. Thanks for any assistance.

visj commented

Hey Ryan!
Thanks for the detailed bug report and findings! I noticed I had missed this case for my suggested fix with S.track. The issue was that pending nodes were not lifted when a node was pending disposal but not pending in any other way, yet still being stale.

I've pushed a minor fix (and used your code as a test case to validate, hope that's ok?) to my repo, and the order in which each trigger is being set in freeze should no longer be an issue. Does that bring track on par with the current subclocks functionality?

I'm glad for this issue, designing test cases to cover everything has been quite tricky.

Edit: this connects a bit to my previous discussion with Adam. I've been considering forking my version with track into its own project, as it seems as if having both track and subclocks is redundant. Would be great to hear your input Ryan, I have a few other ideas on how to bring the project forward.

Yes, that fixed my issue.

Subclocks may have some benefits, but currently, from an API standpoint, I don't have much need to expose either beyond letting people set a comparer on a Computation. ie, the downstream conditionals I've been looking at. That is probably a limited view though.

I have had to do some breaking of synchronicity a few places that I'm unsure how either approach would work. Implementing a React Suspense-like experience involved making a counter that would increment as downstream code is executed, and decrement as async responses came back. I only want to trigger the parent when entering count 0 or 1. Since the whole thing is in a computation it didn't like me incrementing the value multiple times. I ended up making it boolean signal and kept the count outside the reactivity.

visj commented

From an API standpoint, I evaluated my thoughts in #26 (comment). Generally, I've experienced that ´S.track´ more often than subclocks provide an easy way to handle conditional dependencies, which is the major lacking feature I found in S. However, I still think subclocks is a more complete solution and would be more versatile for use cases where track just wouldn't suffice.

I know that your example is what Adam would call a "bug" in your program. I've encountered similar situations myself, and it shouldn't be that hard providing a custom signal type that allows mutating the signal value multiple times for each tick, only using the finally computed value when entering the next. However, I don't know enough about this topic to foresee which consequences this may have.

Regarding async callbacks, I think that is a different area than branching logic, more in line with Adam's thoughts in #18?

On another note, it feels as if people believing in fine-grained solutions is quite scattered among many small libraries, tackling the same issues over and over. Perhaps it would be interesting to form a Gitter channel or something where people can share their findings to speed up research in this topic?

I think Adam saw Subclocks as an ability to address all the issues which I can respect. They are also the most fundamental change. I love how track's behavior is mostly completely opt in. I was having some issues getting node recycling working with Subclocks and looked over at what you were doing to find the other way to solve the issue. I hacked something together, but your latest update is much cleaner.

On another note, it feels as if people believing in fine-grained solutions is quite scattered among many small libraries, tackling the same issues over and over. Perhaps it would be interesting to form a Gitter channel or something where people can share their findings to speed up research in this topic?

I love it. I used to talk to @adamhaile pretty frequently via email to just keep the ball rolling on ideas etc.. admittedly I spend much more of my time looking at DOM api part of the equation but I've made some interesting modifications myself, like the ability to use the Computational context to support Dependency Injection very similar React Context API. I've also been playing with pausing and resuming computations, and looking at ways to simulate React Suspense (although again more on the DOM side of things). I think @luwes would also likely be interested in some sort of forum like this. We also end up chatting about these things. We've been both working on the same problem of client side hydration with fine-grained (albeit having to attack it a bit differently because of differences in DOM renderers) but this is a space where there are generally way less resources and a bit out of date. We all have been sort of re-inventing the wheel to try to bring this tech up to speed with where FrontEnd UI development is today.

visj commented

Sounds good. I'm not very active in social media in general, but I've attempted to create a forum for us to use, I named it reactive-js on Gitter. We can continue these discussions there instead of clogging up S's issue tracker!

luwes commented

Cool, great idea! Definitely interested in a forum like this, @ryansolid helped me a lot with Sinuous and using GH issues to ask (quick) questions doesn't always feel like the right place.

Hey Ryan, slow response, even for me :(. Start-up ended up not working out, now on some intense contracting work trying to build back. Such is life.

On your examples up above, if I follow what you're trying to do, the issue is that cache changes one tick after data. Since the nested computation's lifecycle follows cache, it runs one more tick than it should and sees the undefined value in data, causing the exception.

Your code is probably boiled out of a more complicated actual case that would explain what you're after, so this might not apply, but my thought would have been to do the simple thing:

const data = S.data(null);
S(() => {
    if (data()) console.log("(not) nested (any more)", data().length));
});

I think the intent of cache and its setter was to be more performant (correct me if I have that wrong), but I think it likely went the other way. Fundamentally, something still needs to run every time value changes. Adding the plumbing of a value signal and a computation that sets it doesn't change that, it just adds more plumbing. So maybe tell me more about what you're after and I can comment.

I kept parity between subclocks and the main implementation for a while, but the stuff around lazy node construction broke that :(. I'm mixed on whether lazy node construction is worth the complexity it added or not. At one point, it occurred to me that the initial case we were looking at -- avoiding constructing unneeded computations in DOM components -- would be largely eradicated just by making a single computation per component rather than per node, the way Surplus had been.

I've been trying to do a branchSample. We've talked about this before in other threads. Basically a helper function of the form:

function branchSample<T>(condition: () => Boolean, pass: () => T, fail: () => T): T

The goal is to handle branching in views. Basically only desiring to run the conditions when the state changes otherwise you recreate a bunch of DOM nodes. Of course people are free to write conditions like () => count > 5.

I've attempted to do this a couple of ways. My original approach was using S.root with early returns and handling disposal whenever the branch changed(https://codesandbox.io/s/objective-tdd-dif57). I'd also tried breaking synchronicity and using an intermediate S.value(https://codesandbox.io/s/vigorous-cherry-vo79c). We've also talked about hoisting but that is less generalizable because untravelled paths need to be executed and the data may be invalid.

In any case, a user of Solid noticed that if signal would be changed in a way that caused the branch to change resolution (true to false for example) that any computations created downstream would execute again even though their assumption would be due to the condition failing the resolution wouldn't run again. Why it fails is different in each case. In the nested root version the disposal is suspended until execution is done so it ends up running all the way through. And when synchronicity is broken as you mentioned it takes to the next tick to catch up.

The problem is that while you could just null check everything it isn't obvious since people are thinking of it like an if statement. Like calling the helper should be the null check and I tend to agree. It just isn't intuitive. So I was trying to think of ways to get this to work. Incidentally subclocks and VSJolund's track both work. It's a decent overhead for something so simple, but to me essential. I'm not sure if there are any other options.

Part of it is the actual purpose of subclocks isn't completely clear. Or even how they work in cases where signal tracks in computations in and outside of subclocks where the value is updated inside. There are several edge cases but I haven't had that ah-ha moment. The chat that was setup also included a big chunk of us discussing trying to figure it out. Ultimately the only case I was really looking at solving was the notifyOnlyOnChange version of computations.

I actually like node recycling now. My favourite example is that component scenario where passed in values may or may not be reactive. See I've done something with Solid where most contexts are actually sampled since all my control flow helper functions are sampled. I didn't like having things tracking in most zones since it could easily lead to unintentional re-renders. I suppose changing the granularity to a per template level would solve it since likely atleast something is reactive. I've kept a per expression basis so far (not even per node). I suppose it's worth experimenting with.

Shoot, I had started writing you a longer reply yesterday, but I must have closed that tab or something because it's gone :(. So this may be a little shorter.

Thanks for writing at length, because this is the first argument (or maybe just the first time I've gotten it!) that seems to be on to something. Most of the discussion around something like track has been about efficiency. Assuming all data signals are properly declared as S.data() or S.value(), the only scenario where track increases efficiency is when you have intermediate computations that are less-selective than their upstream value signals. So a graph like:

let v = S.value(1),
    p = S(() => v() < 5),
    c = S(() => p() ? "less than 5" : "more than 5");

p() is less-selective than v(), because its value doesn't always change when v()'s does.

The problem with the efficiency argument is that to avoid extra updates of cases like the above, it requires a more complicated (hence more expensive) bookkeeping algorithm. Specifically, we no longer know exactly which nodes are stale or not, as there's a new state, maybe-stale, with a new backtracking process to resolve the maybe-stale node into stale or current before we know whether it needs updating. That's a global slowdown to solve a minority of cases.

Furthermore, since we need to run p() for each change in v(), it's not a big-O efficiency gain, just a scalar improvement: the cost of running p() alone vs p() and c().

So in terms of efficiency, it's trading off two scalar factors: universally slower bookkeeping for sometimes faster updates. It's not clear to me whether that would result in a slower or faster system overall.

What I liked about subclocks is, as the README gets into, they do have the possibility of changing big-O complexity. Fundamentally, the difference is that track can only decide whether one signal (its own) does or doesn't change value. Subclocks can decide that for N signals (all the data signals created in the subclock), so they can change the overall big-O cost. That seems compelling enough to pursue, even though it also requires a higher bookkeeping cost.

But here, you're not talking about efficiency, you're talking about correctness and the "principle of least surprise." The case you describe is a subcategory of what I think of as the general problem of "dependent state": when a computation creates a piece of state, like a DOM node or a data signal, that state has a transitive dependency on the computation's sources, because if they change, it gets destroyed and re-created. That changes the less-selective scenario into one that's not just inefficient but semantically surprising.

I may try adding something like track behavior. Subclocks also introduce a maybe-stale state and an additional backtracking pass. It might be that this is an intermediate step between the two, or that both can use the same backtracking pass to resolve.

Ok, that's interesting. I look forward to seeing what can be done there. Your understanding in this area goes a lot farther than mine, so I only really get my head into it when I hit a practical scenario. It will be interesting to see which way makes sense.

On an aside, I wanted to mention I have come around on a few things. I ended up going back to pre-recycling version of S.js for the basis of Solid. As you said I think that wider than per binding or node computations are correct. I've changed to per template (JSX block) where all attributes (non-inserts) share the same computations. What I"m looking at now is actually joining them at runtime into a single computation where the Compiler can't get past component boundaries. Basically having a computation node be able to add more tracking functions at initial execution time. Obviously this doesn't apply to things that are dynamic structurally like loops or conditionals. But basically we could reconstruct the template even if broken into many components and in so wrapping them all in a single computation node. At some point simple diffing probably makes sense but it's very much like getting the best benefits of something like stage0/domc without retriggering complete top-down reconciliation all the time. Basically I think Components are not the correct boundaries. The conceptual template (structurally static piece) are. I cannot completely rely on at compile-time, but this run time approach would do the trick I think.

Secondly, I got rid of the explicit binding syntax that you always felt wasn't quite right. I'm using a heuristic like Surplus. It just happens mine has to beyond function calls include property access like state.value or state[key] to ensure that proxies or getters are tracked. I think it's fine though since literals and simple vars are never wrapped. And as you know it is pretty easy to hoist and assign to a variable (assuming the context is sampled, something I generally strive for everywhere). It's actually consistent with how computeds work in general. You access the value outside if you don't want it to be reactive. Since I've taken the approach of wrapping helpers in Components for Control Flow in Solid (the conditional being the one that started this thread) to ensure the insert points are sampled this is all around pretty safe since people generally won't be wrapping ternaries. Although I'm looking at ways to be smarter about those and boolean expressions. I don't think people always realize that when they do:

<div>{ someCondition() ? <ComponentA /> : <ComponentB />  }</div>

That now everything top level in ComponentA is being tracked. I could always just wrap every Component in a sample call but I've been considering other ways around. I think it is very weird as Component creator to be wary of whether someone might have wrapped you in a computation. You kind of want to design your component without worrying about what weird stuff someone who would use it might do.

Anyway I'm super interested in any progress you make in this subclock/track scenario.

visj commented
let v = S.value(1),
    p = S(() => v() < 5),
    c = S(() => p() ? "less than 5" : "more than 5");

Furthermore, since we need to run p() for each change in v(), it's not a big-O efficiency gain, just a scalar improvement: the cost of running p() alone vs p() and c().

So in terms of efficiency, it's trading off two scalar factors: universally slower bookkeeping for sometimes faster updates. It's not clear to me whether that would result in a slower or faster system overall.

Despite being scalar O improvement, the O's operate in different O time spaces. A more illustrative would be like this:

let v = S.value(1),
  p = S(() => v() < 5),
  c = S(() => p() ? costlyOperation() : evenCostlierOperation());
S.on(c, () => { applyCostlyDomUpdates(); } );

Only evaluating efficiency in terms of how the internal graph operations are affected misses the bigger picture of triggering unnecessary operations that might be long-running. A practical example of an operation running in a different time space could for instance be fetching remote data from the server.

My vs-bind attempt is far from optimal. Nodes should ideally only be queued a maximum of 2 times per tick: once if they are triggered by a tracker, and once if they are marked stale in a downstream dependency. This could be achievable by linking both STALE and PENDING to the node's age.

In order for them to run in topological order the update queue could be implemented as a double-ended queue, such as Deque.
Essentially you would push every node to the queue when applying data changes, and whenever you encounter a tracking node that is marked changed, you unshift each dependent node into the queue so they run in the correct order.

I have not evaluated the performance implications of switching the Queue implementation to a double-ended queue, but as long as that works the track should incur no performance overhead compared to the current implementation.

What I liked about subclocks is, as the README gets into, they do have the possibility of changing big-O complexity. Fundamentally, the difference is that track can only decide whether one signal (its own) does or doesn't change value. Subclocks can decide that for N signals (all the data signals created in the subclock), so they can change the overall big-O cost. That seems compelling enough to pursue, even though it also requires a higher bookkeeping cost.

This is the essential point: track (probably) incurs lower bookkeeping penalty than a subclock implementation, but only provides a solution for 1-N bindings, whereas subclocks solve N-N bindings.

visj commented

I created some benchmarks between double ended queues and the current, simpler Queue version, and preliminary it looks as if it does hit performance quite a bit.

If track cannot be implemented without causing too much slowdown I agree with @adamhaile that Subclocks probably is a better way forward as it solves more cases than track.