ReadableStream and in-parallel/event-loop integration question
Closed this issue · 7 comments
I have question regarding the use of streams in fetch, in the specific case of transmit the body
of a request.
So I think in all cases, when "transmit the body" is called, we are on "in-parallel" steps of the spec, hence I think we shouldn't be directly interacting with javascript objects, as described in https://html.spec.whatwg.org/multipage/#event-loop-for-spec-authors:in-parallel
So I'm wondering how, when transmitting the body of a request from parallel steps, the spec then says to:
- get a reader for the stream
- Start reading chunks from it until it is done or errored.
Wouldn't this require interacting with the corresponding javascript stream, which could consists of an arbitrary number of other streams piped/transformed into it?
For example, let's say the JS wants to upload some data to a server, which would be the end-result of a pipeline of streams. When "transmitting the body" of the request, we would effectively have to pull chunks from that pipeline and therefore run the corresponding javascript on a event-loop, correct?
Would it perhaps be more correct to queue a task on the event-loop which would start reading chunks from the stream, and for each chunk queue "in-parallel" steps back on fetch that would transmit those chunks over the network?
Similarly, when http-network-fetch
enqueues bytes to the response stream, shouldn't this also happen via a task queued on the event-loop(since otherwise the queuing is done immediately from parallel steps)?
Since enqueuing calls into the strategySizeAlgorithm
of the controller, while that initial strategy is set by the user-agent here, if the response body passed to the JS via one of the queued fetch tasks has been piped by the JS to other streams, I guess their strategy size will be called into as well at that point(meaning we're interacting with JS objects from parallel steps)?
Wouldn't this require interacting with the corresponding javascript stream, which could consists of an arbitrary number of other streams piped/transformed into it?
I think so, yes.
Would it perhaps be more correct to queue a task on the event-loop which would start reading chunks from the stream, and for each chunk queue "in-parallel" steps back on fetch that would transmit those chunks over the network?
That seems like a good solution. I suppose the current text could be considered as shorthand for doing something like that, but it's not helpful for people who actually have to implement it.
On the other hand, in Blink currently it doesn't work by queuing a task, and I worry that the difference might be visible.
On the other hand, in Blink currently it doesn't work by queuing a task, and I worry that the difference might be visible.
How does it work in Blink(I've had a look but I'm not sure)?
That seems like a good solution. I suppose the current text could be considered as shorthand for doing something like that, but it's not helpful for people who actually have to implement it.
I was mostly wondering when to start the reading workflow(acquire the reader, read a chunk via a promise, sent it over IPC to the networking when it resolves).
It could either be done when the request is first created, which would seem to assume the fetch isn't going to be terminated before it hits "transmit-body", or it can be done by queuing a task back on the event-loop as part of "transmit-body".
I've have a look at Gecko and Blink, it seems the FetchStreamReader
in Gecko roughly corresponds to the BodyStreamBuffer
in Blink.
For Gecko, it looks like the reading workflow start immediately in the request constructor, and I though it might be similar in Blink(BodyStreamBuffer::OnStateChange
also seems to be called in the constructor of BodyStreamBuffer
).
On the other hand, in Servo, I was thinking of actually queuing a task back on the event-loop from the parallel "transmit body" steps, and then start the reading inside that task, as opposed to starting when the request is created.
So it could be interesting to see if such differences could be noticeable to script, perhaps if the fetch is terminated "early", like in the same task it is initiated. If the reading were to be done in a task-queued as part of "transmit the body" in fetch, it would only be queued if the fetch hadn't been terminated before.
How does it work in Blink(I've had a look but I'm not sure)?
We do it synchronously by entering the relevant script state when we want to call read().
For Gecko, it looks like the reading workflow start immediately in the request constructor, and I though it might be similar in Blink(BodyStreamBuffer::OnStateChange also seems to be called in the constructor of BodyStreamBuffer).
I haven't traced it through fully, but I think you're right.
So it could be interesting to see if such differences could be noticeable to script, perhaps if the fetch is terminated "early", like in the same task it is initiated. If the reading were to be done in a task-queued as part of "transmit the body" in fetch, it would only be queued if the fetch hadn't been terminated before.
I was also worried about what would happen if a higher-priority task blocked the queued task, but I'm not sure that's relevant.
We do it synchronously by entering the relevant script state when we want to call read().
Ok, could that operation be wrapped inside a queued task instead?
Last question: Specifically for a request body, when in the lifetime of a fetch do you do you do this?
I was also worried about what would happen if a higher-priority task blocked the queued task, but I'm not sure that's relevant.
The way I read the spec, is that:
transmit-body
is called from parallel steps of a fetch.- It then acquires a reader, and a read a first chunk from the stream as part of Steps 3 and 4.
- It then starts, on step 5, a new set of "in-parallel" steps to actually pull the data from the promise(step 5.1.1 is "Wait for read to be fulfilled or rejected") and transmit them over the network, and read another chunk(via a new promise, see step 5.2.3) until the stream is done.
So step 5 is a bit like starting a "transmit body worker" that runs in parallel to already "in-parallel" fetch steps.
So if step 3-4 were to happen from a queued task, using the networking task-source, those steps could then queue steps back on the "transmit body worker", which could be defined as a parallel-queue.
I think step 5.2.3, the "read another chunk" part, could be moved to the queued task, which should simply keep reading chunks(through a chain of promises so not blocking the event-loop) and send each chunk back to fetch by queuing steps on the parallel queue. Alternatively, it could be left to the worker to request a new chunk by queuing a task, once a chunk has been transmitted over the network.
So in such a setup, the "main" fetch steps could keep running, since those steps happen on a different set of "in-parallel" steps from the conceptual "transmit-body worker".
Then from the event-loop perspective, if the networking task-source is used, that task could be delayed due to task-queue prioritization, however it would retain it's overall ordering versus the other fetch tasks, and a potential delay would not necessarily block the main fetch, only the "transmit-body worker".
This still leaves the question of whether this would fit into Blink and Gecko's current approach.
cc this issue with whatwg/xhr#277, because that is an actual example where if you implement reading chunks from the stream via queuing tasks on an event-loop, you can get an actual deadlock in the case of the event-loop blocking on a sync XHR.
I guess you don't have that problem if you simply "do it synchronously by entering the relevant script state when we want to call read()", unless if this means acquiring some kind of lock that would be held by the sync XHR call...
So I ended doing this in servo/servo#25873.
From this experience, I think it's perhaps not really important to "queue a task" explicitly when manipulating the stream from fetch.
In Servo this is indeed done via queuing a task(the "fetch" process sends an IPC message to "script", and a task is queued on the event-loop by the IPC "router" thread in the "script" process), however the task queuing is really just a way to "enter the relevant script state", in this case via the message-passing infra of task-queuing.
So if Chromium does this "synchronously", I think at the end of the day it doesn't really matter, and I see no need to mandate "task-queuing" to do this(the operation of reading from the stream doesn't seem to require putting on an explicit task-queue).
I guess the only thing that matters is that the read operation doesn't actually happen in parallel to the event-loop, in the sense that you can't be executing tasks and reading chunks at the very same time(since that could be noticeable from script)?
Should we close this one?