Make read requests abortable
domenic opened this issue · 15 comments
Noted by @reillyeon in WICG/serial#112 (comment) .
For some cases it would be nice to dequeue (and possibly abort?) a given read request, without canceling the stream entirely. That is,
const controller = new AbortController();
// Assume "rs" contains chunks A and B.
const reader = rs.getReader();
const promise1 = reader.read();
const promise2 = reader.read({ signal: controller.signal });
const promise3 = reader.read();
controller.abort();
The intent here is that promise1
gets fulfilled with chunk A, and promise3
gets fulfilled with chunk B.
Some questions:
- What does this do to
promise2
? Probably reject it with an"AbortError"
DOMException
(like in otherAbortSignal
scenarios). - What happens if abort is signaled after the read request is already fulfilled? Probably nothing (like in other
AbortSignal
scenarios). - Does this do more than just dequeue the read request? For example, could it be communicated all the way down to the underlying source, I guess via
pull()
? I don't think this helps much, but it's worth checking...
This is looking relatively straightforward to me to spec, and the use case @reillyeon presents is a good one. Should we just go for it?
The desired behavior seems analogous to the preventCancel
and preventAbort
options to pipeTo()
and pipeThrough()
.
IIRC calling reader.cancel()
will cause promise2
to resolve with done
set to true
. Using the AbortSignal
should probably cause it to reject with an "AbortError"
DOMException
instead since the stream isn't really "done".
If we added an AbortSignal
parameter to pull()
then it would make sense to signal it in this case, but only if there were no other reads queued.
- What does this do to
promise2
? Probably reject it with an"AbortError"
DOMException
(like in otherAbortSignal
scenarios).
Yes, that seems most appropriate. We do the same for rs.pipeTo(dest, { signal })
.
While we're on the topic: I think rs.pipeTo(dest, { signal })
should also use the given signal for all of its read requests. That will allow the pipe to shutdown faster.
- What happens if abort is signaled after the read request is already fulfilled? Probably nothing (like in other
AbortSignal
scenarios).
Yup, that makes sense.
I think it's also important to guarantee that if we do reject the read request with an AbortError
, then the next read request will receive the "missed" chunk. That is, aborting any read request does not "lose" any chunks. (We should have some tests for this.)
- Does this do more than just dequeue the read request? For example, could it be communicated all the way down to the underlying source, I guess via
pull()
? I don't think this helps much, but it's worth checking...
I think just dequeuing the read request is fine.
A while back, I suggested we could pass an AbortSignal
to pull()
, but that was tied to rs.cancel()
. Not sure where we want to go with that. 🤷
Could we also make this work for BYOB readers? We'd probably want something like reader.read(view, { signal })
, but this has some problems:
- We need to return ownership of the backing array buffer after the read is aborted, but we can't do that if we merely reject with an
AbortError
. - Before we can return ownership of the buffer, we need to take it away from any pending
byobRequest
. But how will we know when the underlying sink is done usingbyobRequest.view
?
For BYOB readers I think we'll need to pass an AbortSignal
to the underlying source and it has the option of either ignoring the signal or immediately indicating that the read has completed with whatever data has been written into the buffer so far.
As I understand the AbortSignal
interface it is valid for an API accepting a signal to ignore it. This would be required for an underlying source implementation which passed the buffer to an API which did not support aborting an operation.
As for resolving reads with partial data, it might not be an issue in practice. For the underlying source implementations I maintain (in the Web Serial API and its polyfill based on WebUSB) supporting BYOB readers would never create a situation in which the buffer is partially filled as it will be returned immediately to the caller whenever any data is available.
Generally sounds good. This is safe in terms of the locking semantics because we cannot release a lock while there is a pending read request. Maybe that is worth noting.
Is there a practical use-case for aborting a read request which has subsequent read requests?
I don't have one. The use case for this coming from Web Serial API developers is being able to unlock the stream without closing it.
Then how about adding a method abortPendingRequestsAndReleaseLock
to the reader? That method returns a promise.
I haven't heard a sufficiently compelling use case for cancelling a single read()
while leaving other reads pending.
Strictly speaking, if a call to read()
caused backpressure to be relieved, then cancelling the read()
should cause backpressure to be re-established, but we have no mechanism to do that.
The fact that this makes byte streams into a special case worries me greatly.
WICG/serial#122 is another case where this seems to be useful, although again it's possible there are alternate strategies... I'm inclined to think this is a good idea though.
I guess the remaining work is:
-
Figure out whether the use cases on hand, and any others we can think of, are better served by invalidating individual reads vs.
abortPendingRequestsAndReleaseLock()
. -
Figure out byte stream integration.
Coming from the Node.js perspective, it would be better for the AbortController
to be associated with the Reader
as a whole rather than the individual read operation. So...
const controller = new AbortController();
// Assume "rs" contains chunks A and B.
const reader = rs.getReader({ signal: controller.signal });
const promise1 = reader.read();
const promise2 = reader.read();
const promise3 = reader.read();
controller.abort();
There are a couple reasons for this...
-
Not all read operations are cancelable. libuv fs operations, for instance, cannot be canceled once they are picked up by the libuv thread pool. You can only cancel them while they are still queued. If we consider each discreet read a single libuv request, then, the read-level abort is of limited use and unreliable.
-
In most operations, it's the entire sequence of reads that are critical. It will be rare to abort a single read. What we'd want is to abort the entire set of read operations and clean up the underlying resource.
When the abort() is called, I would expect any already fulfilled reads to just ignore it but all subsequent still pending read promises to be rejected. So, for the sake of the example, let's assume that promise1 ends up resolving synchronously (for whatever reason) but promise2 and promise3 are still pending. Both promise2 and promise3 would reject with AbortError
.
I really cannot think of a case where I would want to abort a single read without aborting the entire subsequent sequence of reads that also may be pending.
Thanks @jasnell, that's very helpful. It sounds similar to the discussion a few comments up in #1103 (comment).
The case in WICG/serial#122 seems like it might desire canceling the read without canceling subsequent reads, but upon reflection, I think you could write it in a different way which avoids that, hmm...
It seems this might be the low-level primitive whatwg/fetch#180 wants.
From #1168, it turns out rs.pipeTo(ws)
is actually cheating. To correctly express the desired behavior when preventCancel = true
, we may in fact need abortable reads ourselves. 😅
Figure out whether the use cases on hand, and any others we can think of, are better served by invalidating individual reads vs.
abortPendingRequestsAndReleaseLock()
.
For the use case of pipeTo()
, we don't really need an AbortSignal
for each reader.read()
call, a single signal associated with the entire reader would also suffice.
Figure out byte stream integration.
All read(view)
calls reject with an error, so they don't get their BYOB view back. (This is similar to cancel()
, except that resolves pending reads with { done: true, value: undefined }
instead.)
The BYOB request should remain valid, so the underlying byte source can continue writing into it. However, it's no longer associated with a pending read-into request, which makes things trickier. Right now, the specification assumes that the first pull-into descriptor corresponds to the first read-into request, but this change would break this assumption.
Instead, we want committing this "detached" BYOB request to behave more like calling controller.enqueue()
: copying bytes into any non-detached pull-into descriptors (and resolving read(view)
requests), and putting the remaining bytes into the stream's queue.
...Sounds like a lot of spec refactoring to be done. 😛
FYI, there's some user-level security implications for #1143 that this can help address. While it wouldn't be as important on the client side, Deno servers would greatly benefit from it for securely reading bodies.
Am I correct that with the change above we can consider this issue closed because releaseLock()
functions as an "abort" method for read()
which doesn't abort the whole stream. What is the status of implementation of this behavior in browsers?
Correct, #1168 fixes this issue. Closing.
Instead of adding a new API, we changed the behavior of the existing releaseLock()
method. So the use case in the original question would now look like this:
const reader = rs.getReader();
const promise1 = reader.read();
const promise2 = reader.read();
const promise3 = reader.read();
reader.releaseLock();
// This is now always allowed, even if one or more read() promises are still pending.
// For example, if rs only contains 2 chunks, then promise1 and promise2 are resolved,
// but promise3 will reject with a TypeError.
If you want to use an AbortController
, you can do that manually:
const controller = new AbortController();
const reader = rs.getReader();
controller.signal.addEventListener("abort", () => {
reader.releaseLock();
});
const promise1 = reader.read();
controller.abort();
What is the status of implementation of this behavior in browsers?
This spec change is hot off the press, so no browser has implemented it yet. We've filed issues with all major browsers, you can have a look at the "implementation bugs" listed in #1168 to track their progress.