Fiber Principles: Contributing To Fiber
sebmarkbage opened this issue · 11 comments
I just wanted to document a few unique design patterns that apply to Fiber, but not necessarily anything else. I'll start here.
- You may mutate the fiber that you're working on during
beginWork
andcompleteWork
phases but you may not have any other global side-effects. If you need a global side-effect, that have to be moved to thecommitWork
phase. - Fiber is a fixed data structure. It shares the same hidden class. Never add fields outside of construction in
ReactFiber
. - Nothing in the reconciler uses dynamic dispatch. I.e. we don't call a first class function, except for user code such as ref callbacks, functional components, render methods, etc. The rest is a static function available in a closure. I.e. use
myHelper(obj)
instead ofobj.myHelper()
. Any time we need to branch logic we use a switch statement over atag
which is a number that indicates which type of object we're dealing with and which branch to take (see pattern matching). - Many modules are instantiated with a
HostConfig
object. It is a single constructor that gets called on initialization time. This should be inlinable by a compiler. - Nothing in Fiber uses the normal JS stack. Meaning it does use the stack but it can be compiled into a flat function if needed. Calling other functions is fine - the only limitation is that they can't be recursive.
- If I can't use recursion, how do I traverse through the tree? Learn to use the singly linked list tree traversal algorithm. E.g. parent first, depth first:
let root = fiber;
let node = fiber;
while (true) {
// Do something with node
if (node.child) {
node = node.child;
continue;
}
if (node === root) {
return;
}
while (!node.sibling) {
if (!node.return || node.return === root) {
return;
}
node = node.return;
}
node = node.sibling;
}
Why does it need to be this complicated?
- We can use the normal JS stack for this but any time we yield in a
requestIdleCallback
we would have to rebuild the stack when we continue. Since this only lasts for about 50ms when idle, we would spend some time unwinding and rebuilding the stack each time. It is not too bad. However, everything along the stack would have to be aware of how to "unwind" when we abort in the middle of the work flow. - It is plausible we could do this at the level of OCaml algebraic effects but we don't currently have all the features we need and we don't get the performance tradeoffs we want out of the box atm. This is a plausible future way forward though.
- Most code lives outside of this recursion so it doesn't matter much for most cases.
- Most of what React does is in the space of what the normal stack does. E.g. memoization, error handling, etc. Using the normal stack too, just makes it more difficult to get those to interact.
- Everything we put on the stack we generally have to put on the heap too because we memoize it. Maintaining the stack and the heap with the same data is theoretically less efficient.
- That said, all of these optimizations might be moot because JS stacks are much more efficient than JS heaps.
- One thing that I wanted to try was to compile React components to do work directly on these data structures, just like normal programming languages compile to make mutations etc. to the stack. I think that's where the ideal implementation of React is.
Let's just try it and see how it goes. :D
Aren't you just working around the lack of threads in the language?
Yes and no.
It is true that we don't have a good option in JavaScript to run threads - and that is a huge problem. We tried exploring various options of running in Web Workers, parallel JS, tried to propose shared immutable persistent data structures to the language, experimented with custom VM tweaks etc. JavaScript the language isn't very suitable for this because of mutable shared runtimes like prototypes. The ecosystem isn't ready for it because you have to duplicate code loading and module initialization across workers. Garbage collectors are not as efficient as they currently are if they have to be thread safe, and VM implementors seem to be unwilling to bare the implementation cost of persistent data structures. Shared mutable typed arrays seems to be moving along but requiring all data to go through this layer seems to be unfeasible in the ecosystem today. Artificial boundaries between different parts of the code base also doesn't work very well and introduces unnecessary friction. Even then you have lots of JS code like utility libraries that have to be duplicated across workers. Which leads to slower start up time and memory overhead. So, yes, threads are likely out of the question until we can target something like Web Assembly.
However, an interesting realization is that there are other benefits of the Fiber architecture that are applicable whether you have threads or not.
ComponentKit, that runs on native, does work using threads for example. It is able to start doing higher priority work in one thread while a lower priority thread is still happening. Leading to a much simpler implementation. However, there are some limitations.
You can't safely abort the background thread. Aborting and restarting a thread is not very cheap. In many languages it is also not safe because you could be in the middle of some lazy initialization work. Even though it is effectively interrupted, you have to continue spending CPU cycles on it. One solution is to do the samething that Fiber does - make an API that has yield points so that you can unwind the stack safely. Then check a flag periodically if you should abort. That is effectively what Fiber does too.
Another limitation is that since you can't immediately abort a thread, you can't be sure if two threads are processing the same component at the same time. This leads to some limitations such as not being able to support stateful class instances (like React.Component
). Although that might be a good thing for other reasons.
Another thing that threads don't automatically buy you is the ability to resume part of the work. You can't just memoize part of the work that you've done in one thread and reuse it in another. You can certainly implement that with threads but you'd end up with similar complexity as Fiber - on top of the threads.
Threads have a slight benefit that you can start the next work slightly earlier because you can interject instructions while the background thread is still powering down. However, because our yield points are frequent enough by default we don't think that will matter for Fiber.
Therefore, yes, we are writing a more complex implementation because we're lacking threads in JavaScript, but because we're forced to deal with that we'll end up with better features than if we simply relied on threads for scheduling.
What about parallelism?
It is true that threads can get you parallelism. However, this has nothing to do with scheduling for responsiveness. You generally don't want to spend one CPU processing work that was already aborted because higher priority work arrived. That doesn't buy you anything at all.
Instead, what you really want is to calculate independent work in different threads. For example, two React siblings can be calculated in parallel threads since they're disconnected. If we had access to threads, that's what we would do. However, everything about Fiber is still valid for scheduling purposes in that architecture even if we use threads for parallelism of subtrees.
So, in conclusion, no we're not just working around limitations in the language.
NOTE: This is not to say that you should choose cooperative scheduling over threads in your project/use case. It's just that for our particular use case it made sense. Threads still make sense in many other cases.
Ok, so cooperative scheduling might have some benefits over preemptive threads but...
Couldn't you just use generator functions like other scheduling frameworks have done?
No.
There's two reasons for this.
-
Generators doesn't just let you yield in the middle of a stack. You have to wrap every single function in a generator. This not only adds a lot of syntactic overhead but also runtime overhead in any existing implementation. It's fair that the syntax might be more helpful than not, but the perf issue still stands.
-
The biggest reason, however, is that generators are stateful. You can't resume in the middle of it.
function* doWork(a, b, c) {
var x = doExpensiveWorkA(a);
yield;
var y = x + doExpensiveWorkB(b);
yield;
var z = y + doExpensiveWorkC(c);
return z;
}
If I want to execute this across multiple time slices I can just step through this. However, if I get an update to B when I have already completed doExpensiveWorkA(a)
and doExpensiveWorkB(b)
but not doExpensiveWorkC(c)
there is no way for me to reuse the value x
. I.e. to skip ahead to doExpensiveWorkB
with a different value for b
but still reuse the result of doExpensiveWorkA(a)
.
This is important to React since we do a lot of memoization.
It is plausible that you can add that as a layer around, but then you're really not gaining much from the use of generators.
There are also languages that have generators that are designed for a more functional use case that has this capability. JS is not one of them.
Ok, but what about OCaml's algebraic effects then? BuckleScript can compile them.
There is some non-trivial overhead in the JS version but assuming that would be ok.
To use algebraic effects in OCaml in this way, I think we would need to clone each fiber so that we can resume it. It is possible that a compiler could reuse an immutable stack frame / fiber without copying but that's not how it works now. Without that, we only get limited benefits.
We would also end up with lots of nested handlers which would add to the overhead and also add linear search time to find the top handler when we do yield.
Besides, to support the existing React API we need a set of features that overlap with the implementation of fibers, such as "return pointers" so by combining them we get some benefits.
I do have high hopes that one day we can just use effects for our use cases internally, but we could also use it to make an even simpler React API itself. However, for now, I think that it is probably easier to have lower level control even though the complexity is higher.
I do look forward to seeing other implementations that prioritize implementation simplicity or using complex compilers to solve the same problems.
Hi! Thanks for writing up these thoughts! They're great!
I realize it's been a few weeks since you wrote this, but there's one bit I didn't quite understand. I may be misreading or missing something, but under the "generator functions" section, you say:
- The biggest reason, however, is that generators are stateful. You can't resume in the middle of it.
function* doWork(a, b, c) {
var x = doExpensiveWorkA(a);
yield;
var y = x + doExpensiveWorkB(b);
yield;
var z = y + doExpensiveWorkC(c);
return z;
}
If I want to execute this across multiple time slices I can just step through this. However, if I get an update to B when I have already completed
doExpensiveWorkA(a)
anddoExpensiveWorkB(b)
but notdoExpensiveWorkC(c)
there is no way for me to reuse the value x. I.e. to skip ahead todoExpensiveWorkB
with a different value for b but still reuse the result ofdoExpensiveWorkA(a)
.
This is important to React since we do a lot of memoization.
Wouldn't memoization actually make this a non-issue? Assuming doExpensiveWorkA
is a memoized function then calling it again with the same value would be fast, effectively allowing you to "skip" the heavy work and continue to doExpensiveWorkB
.
I don't know what these doExpensiveWork
* functions represent so maybe it's not possible to memoize them in this way.
It really more of a placeholder for other work within this generator that may be expensive.
The problem with memoization of doExpensiveWorkA is that React doesn't have an opportunity to inject memoization there. Including its local memoization strategy (using parent context and keys). You could use a global memoization for that method but then you end up with cache invalidation problems instead.
That makes sense. Thanks for clarifying!
The problem with memoization of doExpensiveWorkA is that React doesn't have an opportunity to inject memoization there. Including its local memoization strategy (using parent context and keys).
I'm a little confused. Why there is no opportunity? We could do something like:
const doExpensiveWorkA = lodash.memoize(function doExpensiveWorkA(a) {
.....
and can just do local memoization strategy **in the local function**
....
})
function* doWork(a, b, c) {
var x = doExpensiveWorkA(a);
yield;
var y = x + doExpensiveWorkB(b);
yield;
var z = y + doExpensiveWorkC(c);
return z;
}
@NE-SmallTown If so, I think it will work too. Maybe the problem is it will rely too much on global cache, or not elegant than the fiber structure?
I have to say this is an excellent explaination.
@NE-SmallTown If so, I think it will work too. Maybe the problem is it will rely too much on global cache, or not elegant than the fiber structure?
同意你这种看法,memoize的确可以保留状态,多开销重新去走这个memoize的过程,以及内存。不知道这样实现内存的消耗量级大概是多少
为什么生成器有状态,没有太理解