alewin/useWorker

Proposal: Support for TransferList API

zant opened this issue ยท 9 comments

zant commented

Based on #46.

Introduction

Transferable Objects are objects that can be passed to different contexts. This is useful in cases where we don't want to rely on structured cloning to pass data to a worker. Since this procedure moves data without copying it, it can provide better performance for critical applications (this is normally the case with really large objects).

useWorker currently does not support a way for transferring these objects, since for the objects to get transferred, we need to pass a second argument to postMessage, that is the list of the objects we would like to transfer.

worker.postMessage(message, transferList);

We can see that the current implementation does not support passing a second argument to postMessage.

worker.current?.postMessage([[...fnArgs]])

Solution

Currently, the library provides a really nice way to execute functions inside a worker. Considering how it works:

const addNumber = (arr, number) => return arr.map(el => el + number);

const [addNumberWorker] = useWorker(addNumber)

const arr = new Uint8Array([1, 2, 3, 4, 5]);
// All the arguments we pass go directly to the function in the worker
// So data.buffer will be just another argument instead of being moved to worker context
addNumberWorker(arr, 1, [data.buffer]) // Does not transfer

As we need a way to change the default functionality of the useWorker returned function, we introduce a new option to the options object.

TransferList option

The default will be false, and we opt-in to transferList.

// We pass a new option to the worker
const [addNumberWorker] = useWorker(addNumber, { transferList: true }); 

Usage

Option 1

A first option could be to make the usage similar to how the native postMessage works. So when the worker gets called, we will pass the data property from the MessageEvent to the function.

// Inside the worker
onmessage = e => addNumber(e.data) // Not real implementation

// When we call, we pass arguments as we would with postMessage
const  data = { arr, number: 1 }
addNumberWorker(data, transferList);

However, this will restrict function implementations to just one argument, and so, developers will need to refactor a bit the implementation of the function if they want to opt-in:

// Just one argument, they'll have to destructure it
const addNumber = ({arr, number}) => arr.map(el => el + number)

So we can consider this other approach.

Option 2

// Passing the arguments as elements of an array
addNumberWorker([arr, 1], [arr.buffer]);

With the above, we do not pass the data object to the function they provide, instead we collect the elements of the array and pass them as arguments of the function they provide. This way, developers can use the same function without any code changes with the added functionality to pass a transferList.

But we can also argue that it feels weird to pass arguments as an array, besides, it's not really a wide used pattern on Javascript. So we can consider something else.

Option 3

const addNumber = (arr, number) => return arr.map(el => el + number);

const [addNumberWorker] = useWorker(addNumber, { transferList: true })

const arr = new Uint8Array([1, 2, 3, 4, 5]);
/// Using as we do today with an added argument
addNumberWorker(arr, 1, [arr.buffer]);

Where the last argument will always be the transferList, when using the optional argument. I think this will be a familiar approach for current users. However, we really need to think about edge cases, as relying on argument can be a little fragile if not implemented carefully.

Conclusion

As you can see, I haven't get to a conclusion myself. Although I personally like Option 3 more, there are some consideration we need to make to introduce this feature in a way that both makes sense for developers, and doesn't feels foreign to how the hook works today.

I'd love to hear your thoughts! Thanks for reading!

Note: Besides browser compatibility seems to be fine we may want to provide feature detecting errors so we can spot bugs on older versions and relieve that burden from developers who use this hook.

Reference:

Thanks for the explanation ๐Ÿ‘, I have never used Transferable before.

I was looking at a similar library: greenlet ( very clean approach), trying to understand how they managed this case.

Greenlet checks if the variable is a "Transferable"

const isTransferable = val = > ( val instanceof ArrayBuffer || val instanceof MessagePort || val instanceof ImageBitmap );

In this way, the library takes care of the buffer, without having to specify an additional parameter.

What do you think?

zant commented

Hey @alewin! That's a great suggestion! I really like it, it solves most of the complexity and design of the API. Are you thinking to add this as a default or as an opt in with the new field in the options?

I'm just thinking because sometimes you don't want the optimization because the array on the main thread gets completely empty when transferred, which in some cases can or not be useful.

An example of this can be when you want to render something based on the current state (a typed array) and then pass that array's buffer to the worker to calculate something but without getting rid of the array on the main thread because is still useful for the current task.

With this in mind, I think a nice approach can be to, either:

  • Add as a default optimization but with an option to deactivate
  • Add is as a opt-in in the options object

Note: In general, one can implement a workaround for the example I mentioned, but still, I think it will be nice to give more options to developers :)

Yes, I agree, its better to have one more option and let user decide ๐Ÿ‘Œ

const [addNumberWorker] = useWorker(addNumber, { transferList: false })
zant commented

Awesome! Then I think we are ready to move forward, great discussion ๐ŸŽ‰

Are you good with starting the implementation? ๐Ÿ˜„

Yes, It was helpful! ๐Ÿ˜…I'll take care of it!

zant commented

Great, looking forward to it ๐Ÿ‘Œ

While I was implementing this feature, I had another doubt ...
If we want to check if what is passed to the web worker is a Transferable, it would be difficult to analyze all the parameters, or simply an object with nested Transferables.

Example:

 worker.current.postMessage(demo, [demo.p1.buffer1, demo.buffer2]) // pseudo-code implementation

where demo is:

const demo = {
        p1: {
          buffer1: [1, 2, 3], // ArrayBuffer
          p3: [
            {
              p4: {
                buffer2: [1, 2, 4], // ArrayBuffer
              },
            },
          ],
        },
        buffer3: [1, 2, 3], // ArrayBuffer
      }

Also I cant enable the transferList only for buffer2..

Your Option 3 should solve these problems..

zant commented

Hey @alewin! Sorry couldnโ€™t answer before, I found myself quite busy these days ๐Ÿ˜…

I see your concern. And I think we then have some considerations to make, in the sense that, we can still go ahead with your suggestion (I like it more than op3). But if the user wants some objects to get transferred, they should pass them as arguments. For example:

someWorkerFunction(demo, demo.buffer1, demo.buffer2)

We then check all the arguments and transfer the ones that are transferable. I think it is a bit more cleaner than having to rely on arguments order as op3 suggests, because if we transfer based on argument type, one can for example:

someWorkerFunction(demo.buffer1, demo, demo.buffer2)

It's going to work still, with the added benefit that they'll need to write the variable just once. For example if we implement op3, they'll need to write the same variable variable twice:

// If we rely on the last argument being the transferList,
// and they want to use an array buffer in the function as an argument
// They'll need to pass it as an argument and also in the transferList
someWorkerFunction(demo, demo.buffer1, [demo.buffer1])

And as we can see above they get the transfer and argument for free if we go with your suggestion.

Another argument can be if users do not want those extra arguments on their functions. However, this will be easy to work around, because they can just pass the function they want to useWorker with the first arguments being what they actually want in the function and then ignore the others that they pass to get transferred.

// On call
someWorkerFunction(demo, demo.buffer1, demo.buffer2)

// They ignore the other arguments they passed just to get them transferred
const someFunction = (demo) => doSomething(demo) 

In a way, we move the need to rely on argument order to the user side. They can implement their functions and pass arguments the way they want, having in mind how useWorker works. And of course if they have a big array of things to transfer and do not want to write them as arguments, arguments destructuring is available:

const transferList = [demo.buffer1, demo.buffer2]
someWorkerFunction(demo, ...transferList)

It just feels more natural to me, "Ah! So every argument that has a transfer type gets transferred. I can play with this...". On the other hand having a magic last argument feels weird.

And lastly, if we go with this, we will need a pretty good documentation. Making clear that:

Every argument passed to a function returned by useWorker will be optimized for zero-copy (transferred) but only if it's passed explicitly, just like one will do if using postMessage.
To opt-out this optimization, one can turn it off with transferList: false on the options object.

I think both have some tradeoffs, for example, one can argue that op3 is more explicit and thus better.

At the end, they are also a lot of edge cases to consider that we will only start to see when we we actually ship the feature haha ๐Ÿ˜…

What do you think?

Hey @alewin! Sorry couldnโ€™t answer before, I found myself quite busy these days

No problem, unfortunately I'm busy these weeks too..

At the end, they are also a lot of edge cases to consider that we will only start to see when we we actually ship the feature haha ๐Ÿ˜…

Yes. .to consider all possible edge cases we need to ship this feature ๐Ÿ˜…

Im working here: https://github.com/alewin/useWorker/tree/feature/ISSUE-47-transferable