w3c/ServiceWorker

Backpressure on fetch integrated with Streams

Closed this issue · 121 comments

Imagine ReadaleStream is integrated with fetch. The most naive way is to add stream() method to Body interface.

interface Body {
    ...
  Promise<ReadableStream> stream();
};

Streams is an API that enables the data producer to know that the consumer doesn't want the data and stop producing data. On the other hand, other data types (text, blob, ...) don't have such feature.
My question is, should the loader start loading body while the Body representation is undetermined?

fetch(request).then(function(res) {
  return wait(longtime).then(function() {
    return res.stream();
  });
}).then(function(stream) {
  // Here we have a readable stream.
});

The above code waits longtime and then demonstrates that it wants to get the body data via ReadableStream interface. Should the loader load the body data while waiting and use the internal buffer unlimitedly, or stop loading while the representation is undetermined?

We can provide that information explicitly when calling fetch, but I don't know if it's a good API.

I'm not sure this is the right place to discuss this issue, but I chose here because it could affect the fetch interface.

It would be good to get input from @domenic.

Note that I expect we would not expose a Stream through a promise. I would expect it to be returned from response.body synchronously. And then if you use one of the methods such as json() it would no longer work or not work predictably.

To explain @annevk's comment a bit more, I believe our thinking was that body would be an instance of RequestBodyStream, which is a ReadableStream subclass that also has methods like json() etc. The .stream() idea doesn't seem bad on first glance, but I haven't thought about it too hard; in what follows I'll assume we're going with RequestBodyStream design.

(Conceptually, .json()s implementation would look roughly like the readableStreamToArray example, except it would decode and concat the chunks into a string instead of putting them in an array, and it would do JSON.parse before returning the result. In reality, it would probably be done much more efficiently, e.g. pre-allocate an ArrayBuffer, read all the data into it, and then decode all at once.)

My expectation would then be that upon fetch() being called, the body created and data is buffered internally (or left in the kernel buffer, perhaps?) up to the stream's high water mark (HWM). Thus, if the user never calls .json() or .read() or .pipeTo(...) or anything else that would read the stream, the maximum memory consumed is equal to max(HWM, Content-Length). (Or perhaps slightly above the HWM if backpressure cannot be exerted fast enough.) If the user later calls .json(), it un-pauses the stream until all the data is read and converted into JSON. Similarly if they call .read(), they consume buffered chunks, until they consume all the chunks that were buffered at which point the stream becomes "waiting" and they need to call .wait().

The choice of HWM could in theory be given to users (e.g. as an option to fetch()), but in practice in Node.js we see that nobody uses such a capability and they just accept the default. Node.js chooses a default of 16 MiB, but implementations could use heuristics (e.g. use a lower number on mobile phones). A HWM of 0 would be fine too.

/cc @tyoshino in case there's something I am forgetting

@domenic json() and friends moved directly to Response/Request. So body would be a "pure" stream object I think.

Thank you. My understanding is the following - Are those right?

  • Request / Response have 'body' property of type ReadableStream.
  • When json(), text(), etc are called the stream gets closed.
  • When read() is called on the stream bodyUsed becomes true.

Yeah, although I'm not sure if we can make the third bullet point work if we keep the stream object pure. Perhaps through some temporary observer? But even that would have to be synchronous for bodyUsed to return the correct value...

The third bullet point is definitely achievable since we'd control the creation of the stream. E.g.

var that = this;
this.body = new ReadableByteStream({
  readInto(/* ... */) {
    // actual work
    that.bodyUsed = true;
  }
});

@domenic #452 (comment)

max(HWM, Content-Length)

min?

The proposed back pressure mechanism looks correct. Regardless of where json() etc. are placed, we want to just start receiving and buffer some amount of data to reduce latency. The amount would be limited by some "high water mark". The buffering could be implemented not by ReadableByteStream but some platform (e.g. Blink) specific buffering library.

third bullet point

I agree that it works.

To simplify implementation, we would want to use ReadableByteStream not only for the body property but also as a backend for json(), etc. as @domenic proposed.

But even if we adopt this approach, it seems we can still choose to determine the body reading method to provide to the user at access on body property. Not on body.read() call.

  1. The body attribute's getter must run the steps below:
    1. If the used flag is set, return null.
    2. Set used flag.
    3. Return a ReadableByteStream representing the associated body.

Unless we want to allow the user to investigate body before deciding what body reading method to use. I guess there's no such need.

Fixed to return the ReadableByteStream for the second and later evaluation on the body property once we choose to use the stream.

Objects implementing the Body ... an associated byteStream (initially set to null), ...

  1. If byteStream is not null, return byteStream.
  2. If the used flag is set, return null.
  3. Set used flag.
  4. Set byteStream to a ReadableByteStream representing the associated body.
  5. Return byteStream.

@tyoshino I guess that could work, but perhaps we should make it a method then given all the side effects. But instead of returning a promise it would be synchronous. Not sure though.

make it a method

@annevk Sounds good

I don't really like the side-effecting method. I'd rather just go with the approach from #452 (comment). It also makes the name bodyUsed a lie as it's more someoneOnceAccessedTheBodyPropertyForUnknownPurposes ;)

Thanks, I will write a draft for the integration. @tyoshino's proposal looks fine. @domenic, can you tell me why you don't like it?

@yutakahirano because it's more complex (both in internal code and user-facing code/concepts) than #452 (comment) and doesn't gain anything.

@domenic

I see. json() and text() are only-once methods. So, we don't have to introduce a little complicated algorithm like #452 (comment).

In your idea, bodyUsed is set when json(), text() or any ReadableStream method is called, and are used to prohibit json() and text() but it must not disable the ReadableStream methods (we'll read() many times). Right? Do we want to prohibit touching ReadableStream methods as well when json() or text() takes place first? Then, we need something else than bodyUsed. Otherwise, we don't (bodyUsed suffices).

Do we want to prohibit touching ReadableStream methods as well when json() or text() takes place first?

Nah, if you do that you'll get what you asked for. (A mess.)

Right. To choose not to do that, I want to give a justification supporting that we're putting a guard against json()-after-read() but not against read()-after-json(). Both of them look unexpected usage. json() would be pumping on the stream. It sounds good to give json() exclusive access to the stream.

@yutakahirano said that it might be useful to keep cancel() available even after json() call as it allows us to abort json(). Is this included in your motivation to keep ReadableStream methods available after json() call?

Regarding guard against read()-after-json(), alternative proposed by @yutakahirano is making stream() only-once method. It's cumbersome that you need to save the returned stream to some variable. But we can reduce the complexity of algorithm inside.

We don't really have any concept of "exclusive access to a stream." If you have a stream object, you can read from it; this is similar to a file handle object. it would be possible to introduce a library or subclass that adds this concept, but I would prefer to let libraries prove the worth of that idea and if we see everyone building such tooling them consider standardizing it.

.cancel() is indeed a good reason to keep ReadableStream methods available.


That said, this kind of situation where a "C++ reader" and a "JS reader" might share a stream seems very reminiscent of the off-main-thread piping question, whatwg/streams#97. In fact you can view it as a subset of that if you define .json() using a concat-stream equivalent:

Response.prototype.json = function () {
  var concatenator = new ConcatStream();
  this.body.pipeTo(concatenator);
  return concatenator.asArrayBuffer.then(convertToString).then(JSON.parse);
};

(npm concat-stream is for Node streams and uses a callback; our hypothetical version uses a promise-returning .asArrayBuffer property.)

I will open a new issue on whatwg/streams and consider if maybe we want to lock the stream during piping.

We don't really have any concept of "exclusive access to a stream."

By "exclusive", I meant exclusive between arrayBuffer(), blob(), ... text() and direct access on the ReadableStream interface. I didn't mean exclusiveness between consumers who see the ReadableStream interface.

My idea in #452 (comment) realizes this by hiding the stream object itself when json(), etc. is called. The script can never touch any method of the ReadableStream interface. We can read from it only by waiting for the promise returned by json(), etc. which he/she called gets fulfilled.

.cancel() is ...

OK

... new issue ...

Good

It seems that whatwg/streams#241 allows "read-and-then-pipeTo", but you don't want to allow "read-and-then-call-json" in this issue, right?
The code in #452 (comment) allows that, I think.

With stream() method, we can solve the problem, though.

We definitely want to allow read and then call json(), e.g. for a file format that has descriptive headers then JSON content

We definitely want to allow read and then call json(), e.g. for a file format that has descriptive headers then JSON content

Yeah, it's useful. But now there's no way to tell the stream how big the header is. Using ReadableByteStream.prototype.readInto()? Then, don't we want to provide json(num_bytes) than just jsonUntilEndOfStream() which we have now? Or, readOneJsonThenStopReading()?

We definitely want to allow read and then call json(), e.g. for a file format that has descriptive headers then JSON content

Oh, sorry, I mistook your intention, but it is a bit different from what I said at #452 (comment) .

And for your use case, we need read(size) which we don't have now.

You would use readInto, yes. Not sure about the utility of json(num_bytes). In my view json() is just a stopgap until we have real streams, and/or a convenience for the 80% case.

What is "readInto"? Is it part of the Streams API?

@tyoshino Thanks, I see. I agree with #452 (comment), having separate parsers is better than making those methods complex.

I published a draft: https://github.com/yutakahirano/fetch-with-streams .
I would appreciate if you could give any comments.

@yutakahirano your definition turns Request and Response objects into subclasses of ReadableByteStream. That will complicate adding members going forward to either. It might be easier to expose ReadableByteStream as an associated object through e.g. a body member.

Thank you for the comment.
I should have noticed that. Fixed (yutakahirano/fetch-with-streams@d540269, yutakahirano/fetch-with-streams@63f9aea).

I chose "stream" as the attribute name for now because Body.body() looks a bit strange to me.

This looks like a pretty good start to me. I will file a few issues in that repo (I hope that is OK instead of keeping all discussion here?)

@yutakahirano it would be response.body and request.body given that Body is a mixin, so not that weird.

Had some chat with ServiceWorker team (horo) about .clone() handling when ReadableStream's read() is used.

I think new Request/Response created by .clone() would have a ReadableStream and,

  • data already read from the ReadableStream of the original Request/Response is not read from one of the new Request/Response
  • data not read yet are mirrored to both ReadableStream regardless of whether the chunk is already queued to the ReadableStream's queue at the time of .clone() call or not
  • .cancel() call on one of the ReadableStreams doesn't affect the other

Do these make sense to you?

@wanderview what do you think?

IIRC the plan was for .clone() to consult the body-used flag and throw if it's true. So that would mean you'd never have to worry about

data already read from the ReadableStream of the original Request/Response is not read from one of the new Request/Response

since it'd just throw in that scenario instead.

Yea, I thought clone() would check bodyUsed and throw if its set.

I have to admit I have not kept up with this thread, but I think the easiest thing to do would be to set bodyUsed when the first byte is read from the stream for any reason.

IIRC the plan was for .clone() to consult the body-used flag and throw if it's true. So that would mean you'd never have to worry about

data already read from the ReadableStream of the original Request/Response is not read from one of the new Request/Response

since it'd just throw in that scenario instead.

@domenic, I thought .read() doesn't set .bodyUsed from your comment

We definitely want to allow read and then call json(), e.g. for a file format that has descriptive headers then JSON content

at #452 (comment) . Is that correct?

@domenic, I thought .read() doesn't set .bodyUsed from your comment

You are right, sorry about that!

We definitely want to allow read and then call json(), e.g. for a file format that has descriptive headers then JSON content

Sorry I missed this earlier, but this seems to add a complexity for only a small amount of convenience gain.

I think it would be a lot more straightforward if client code decided up front if they wanted streaming or non-streaming behavior. If they want non-streaming, then use the convenience routines like json(). Otherwise use read() for streaming. Don't mix the two.

I think the right way to provide a json()-like convenience routine for streams would be to define a JsonStreamReader() or ConsumeAsJson() function. These would use read() to drain the stream and convert to json. This would have the added benefit of working with any stream; not just Request/Response.

Unless there is another reason for read() not to mark bodyUsed, I still vote for marking the flag on the first byte read from the stream.

Anyway, just my two cents. Sorry again if this was discussed long ago and I missed it.

I guess I'm assuming we still want to prevent partial streams from being stored in the Cache. Is that still true?

I think the right way to provide a json()-like convenience routine for streams would be to define a JsonStreamReader() or ConsumeAsJson() function. These would use read() to drain the stream and convert to json. This would have the added benefit of working with any stream; not just Request/Response.

My vision of request.json() is that it works exactly in this way.

So bodyUsed is the same as EOF or stream closed condition? Or does it serve an additional purpose? Do we still need this flag?

My current understanding is that .bodyUsed == true means that "someone is using the stream and you can't use it now". It is different from EOF, .bodyUsed turns true even if being read from .text() and .bodyUsed stays false when there are not data and the stream is closed but nobody used the stream yet.

We are talking about "stream lock" at whatwg/streams#241, and my plan is that .bodyUsed is stream.locked.

Thanks. I think I understand. And I think this plan makes sense:

I think new Request/Response created by .clone() would have a ReadableStream and,

  • data already read from the ReadableStream of the original Request/Response is not read from one of the new Request/Response
  • data not read yet are mirrored to both ReadableStream regardless of whether the chunk is already queued to the ReadableStream's queue at the time of .clone() call or not
  • .cancel() call on one of the ReadableStreams doesn't affect the other

Sorry for my confusion. I'll try to read the spec further before commenting again.

OK I just got back into all this and had to re-read the thread because I forgot what we eventually decided on for bodyUsed.

Given @yutakahirano's above comment, can we rename it to bodyLocked? Or maybe we should just expose it on the stream directly as body.isLocked?

@domenic and I talked a bit at yutakahirano/fetch-with-streams#13. Now we should revisit .bodyUsed and .locked, because they are different concepts.

  • The current .bodyUsed specified at https://fetch.spec.whatwg.org/ represents whether some functions such as text() and json() are called. For example, res.text().then(() => res.text()) will be rejected, because .bodyUsed is turned into true at the first text() call and it will never become false. Note that .bodyUsed becomes true even when no data is read - Think about the case when we call text() on a response that has an empty body.
  • A stream can be locked. The lock will be released when explicitly indicated, the stream gets closed or the stream gets errored. text() acquires a lock, but it releases the lock when the stream is closed or errored.
  • There is another concept: whether body data is read. Let me call it .bodyConsumed for now. It becomes true when some data is read via ReadableStream.prototype.read. res.text() will turn this flag into true when the data is actually read (not when the function is called). Note that res.text() on a response with an empty body doesn't make this flag true.

What concept do we need? Surely .locked is needed, but it seems insufficient. .bodyConsumed is good from some security POV (because it is useful to guarantee that the response is not partially consumed). The problem is .bodyConsumed is hard to implement. @domenic's idea needs a custom ReadableByteStream for FetchAPI, and a custom ExclusiveStreamReader for Fetch API because the standard ExclusiveStreamReader doesn't care about the customized read().
The existing .bodyUsed concept doesn't work with .body attribute, because we can freely read from stream even when .bodyUsed is true.

How about @tyoshino's old idea? It is simple, and we can guarantee that the body is not partially consumed when the flag is true. It is easy to implement and easy to understand. Currently I don't have a better idea.

@domenic's idea needs a custom ReadableByteStream for FetchAPI, and a custom ExclusiveStreamReader for Fetch API because the standard ExclusiveStreamReader doesn't care about the customized read().

I don't understand why? It needs a custom underlying source, but every type of stream has a custom underlying source, so a fetch body stream should too.

One way to phrase it (that's a bit more explicit than my code that you link to) is that the underlying source sets bodyUsed to true anytime it calls enqueue().

How about @tyoshino's old idea?

I am not sure whether that idea gives any value to developers. It just tells them that somebody once accessed the body property---not whether any data has been read from it.

I think we need to step back and ask what the purpose of bodyUsed is? How do we see developers using it?

@domenic's idea needs a custom ReadableByteStream for FetchAPI, and a custom ExclusiveStreamReader for Fetch API because the standard ExclusiveStreamReader doesn't care about the customized read().

I don't understand why? It needs a custom underlying source, but every type of stream has a custom underlying source, so a fetch body stream should too.

One way to phrase it (that's a bit more explicit than my code that you link to) is that the underlying source sets bodyUsed to true anytime it calls enqueue().

Let me confirm:

var response;
var p = fetch(...).then((res) => {
  response = res;
  return res.body.ready;
}).then(() => {
  // This is not generally true, but please assume this holds in this example.
  assert(response.body.state === 'readable');

  return response.text();
});

Will p be resolved, or rejected? text() is called after enqueue() is called, so it should be rejected, right?

Oh I see, you are right. Enqueue is not the right place to put that. Interesting that it seems like it would work if we marked bodyUsed inside a ReadableByteStream's underlying source's readInto, but it will not work with a plain ReadableStream. Do you agree with that?

If that is right, maybe we should not have a bodyUsed until we implement the ReadableByteStream interface?

To be clear, I don't really care what we do here, but I'd like the invariants in the specification to be preserved (around the "used flag").

We expose bodyUsed so developers can detect whether serialize methods (text() et al), clone() methods, and constructing a Request based on another Request, might fail. Not exposing it leads to try/catch or worse (serialize methods return promises).

Sorry, I misunderstood ReadableByteStream.

As you say, we can observe the underlying source's readInto, so we definitely can know when read or readInto is called.

  • Body mixin's used flag is set when stream's readInto is called.
    • But, used flag can be incorrect when the stream is locked to an exclusive lock, to give an optimization chance to implementations.
  • Remove Body.bodyUsed. Because used flag is valid only when stream is not locked.

Can we define like this? This introduces incompatible changes.

  • Body.bodyUsed is removed.
  • Reading an empty body does not set used flag. For example, if res is a response with an empty body, res.text().then(() => res.text()) is resolved with an empty string.

To be clear, I don't really care what we do here, but I'd like the invariants in the specification to be preserved (around the "used flag").

@annevk, I would like know if the followings are important:

  • Body.bodyUsed is exposed to JavaScript.
  • Once used flag becomes true, it will never be false.
  • Calling text() twice will fail, even when the body is empty.

Why would the "used flag" be unset? I could imagine giving in on the other two, depending on the final picture.

If we set used flag based on whether the body data is consumed, the exact value should not be observable when body is locked to a reader. That is why I remove Body.bodyUsed in
#452 (comment). If we must expose the value, we may set used flag when body is locked. If no read call is made and stream is unlocked, used flag will become false again.

Note that one thing you would have to redesign is that if you construct Request from a Request that has a body, we move the body to the new object and set the "used flag" on the old object.

@yutakahirano

Just to clarify one thing:

  • The current .bodyUsed specified at https://fetch.spec.whatwg.org/ represents whether some functions such as text() and json() are called. For example, res.text().then(() => res.text()) will be rejected, because .bodyUsed is turned into true at the first text() call and it will never become false. Note that .bodyUsed becomes true even when no data is read - Think about the case when we call text() on a response that has an empty body.
  • A stream can be locked. The lock will be released when explicitly indicated, the stream gets closed or the stream gets errored. text() acquires a lock, but it releases the lock when the stream is closed or errored.
  • There is another concept: whether body data is read. Let me call it .bodyConsumed for now. It becomes true when some data is read via ReadableStream.prototype.read. res.text() will turn this flag into true when the data is actually read (not when the function is called). Note that res.text() on a response with an empty body doesn't make this flag true.

The .bodyUsed flag is also set when the UA drains the body. For example, Cache.put() will drain the bodies for both the Request and Response objects. Cache.put(req, res) will result in both req.bodyUsed and res.bodyUsed being set to true.

Setting these used flags occurs at the beginning of the put(), before any data is read yet. I believe we did this so that the UA can read the data on another thread without worrying about racing with content for the body data. See steps 3.2 and 5.2.

https://slightlyoff.github.io/ServiceWorker/spec/service_worker/index.html#cache-put

Just wanted to note that as it didn't seem directly covered by your other cases.

@wanderview ReadableStream has "exclusive reader" concept https://streams.spec.whatwg.org/#reader-class that corresponds to .locked in my text. If we use .bodyConsumed as used flag, we also use .locked. Then any other reader cannot read the body, so I think the existing behavior will be preserved (though the spec needs to be updated).

Note: I'm not fully convinced to use .bodyConsumed. #452 (comment) and #452 (comment) look also reasonable as I said before.

Thanks @annevk for helping remind me what bodyUsed was meant for. Namely, something developers can use to avoid try/catch with the convenience methods text() etc.

With that in mind I think bodyUsed, perhaps with a better name, should be true when any of the following is true:

  1. The stream is currently locked
  2. The stream is "closed" or "errored" but has Content-Length > 0.

I think that might be enough? That is, I think these are all of the cases where cloning or the convenience methods will fail.

A couple things to note:

  • If the stream has been partially read (using .read()) but not read to the end, bodyUsed should be false, since you could still call .text() or clone it and get something useful.
  • If text() is ever called on a non-empty body, then bodyUsed will be true, since either the stream will be currently locked, or it will have been closed/errored before being unlocked.
  • If text() is called on an empty stream, then there is no effect. It is safe to call .text() on an empty stream as many times as you want.

Does this sound good? Did I miss anything?

I heard that cache.put and respondWith want to assume that no bytes are read from the body, from @Horo. Is my understanding correct? Is the idea shared among serviceworker guys?

cache.put() checks to see if bodyUsed is set on both Request and Response. If either is true, then it rejects.

cache.put() will also set bodyUsed on both Request and Response if there is a body to read.

So yes, currently, cache expects no bytes to have been read before the put(). I can't think of any implementation reason why we couldn't allow a partial body to be written, but it might be a big footgun for developers.

Reply to @domenic (#452 (comment))

  1. The stream is "closed" or "errored" but has Content-Length > 0.

Did you want to mean this?

stream.state === "errored" || (stream.state === "closed" && at least one byte was enqueued to the stream)

Contents of this comment has been reorganized into the next comment.

Still updating. Please comment if you find anything incorrect in the table.

Determine which column should be represented at bodyUsed, and then do Quine-McCluskey.

Note

  • Assume that res has non-null body.
  • Assume that s.read() never returns an empty ArrayBuffer

Legend

  • Row: Precondition
  • Column: Action

Request

\ read() text() fetch() new Request() cache.put() clone()
Nothing (empty body (stream starts from closed)) N/A Y Y Y Y Y
Nothing (non-empty body) Y Y Y Y Y Y
After read(). stream is not yet closed Y Y [1] Y [1] Y [1] Y Y
After read(). stream is closed N/A Y [1] Y [1] Y [1] Y Y
text() ongoing N N N N N N
After text() on fresh empty stream finishes N/A Y [1] Y [1] Y [1] Y Y
After text() on non-empty stream finishes N/A Y [1] Y [1] Y [1] Y Y
After fetch() N N N N N N
After new Request() N [3] N [3] N [3] N [3] N [3] N [3]
After cache.put() N N N N N N
After clone() where res is fresh Y Y Y Y Y Y

Response

\ read() text() respondWith() cache.put() clone()
Nothing (empty body (stream starts from closed)) N/A Y Y Y Y
Nothing (non-empty body) Y Y Y Y Y
After read(). stream is not yet closed Y Y [1] Y [1] Y Y
After read(). stream is closed N/A Y [1] Y [1] Y Y
text() ongoing N N N N N
After text() on fresh empty stream finishes N/A Y [1] Y [1] Y Y
After text() on non-empty stream finishes N/A Y [1] Y [1] Y Y
After respondWith() N N N N N
After cache.put() N N N N N
After clone() where res is fresh Y Y Y Y Y

You mean new Request(req) (and you forgot req.clone(), though it should be roughly identical to res.clone()).

@annevk Oh, right. Added a note at the footnote.

It seems Anne is basically more concerned about "arbitration".

As Yutaka said in #452 (comment), we were wondering if there's need for "modified flag". Maybe there's no concrete reason to ban drained request/response (#452 (comment)) found so far?

Domenic's this idea (#452 (comment)) is also taking care of dirtyness.

I'm wondering if we should separate these concepts (arbitration flag and modified flag) into different variables.

I'm wondering if we should separate these concepts (arbitration flag and modified flag) into different variables.

If we choose that way (and if text() uses arbitration flag only), I don't think we need @domenic's condition (#452 (comment)). It's ok that text() returns a promise resolved with an empty string when the stream is closed.

yutakahirano@

I heard that cache.put and respondWith want to assume that no bytes are read from the body, from @Horo. Is my understanding correct? Is the idea shared among serviceworker guys?

Sorry I am @horo-t in github :)

From the security perspective, we want to distinguish the partially consumed response.
I think:

  • The partially consumed response should be treated as the response which is created in the ServiceWorker, not the response from the server.
  • The URL of partially consumed response object should be empty.

So my idea is:

  • response.body should be a method. (#606)
  • response.body() set bodyUsed flag, and returns Stream (ReadableStream?)
  • After response.body() was called, both cache.put(response) and FetchEvent.respondWith(rsponse) should fail.
  • If you want to read the header bytes of the response and returns the remaining bytes to the page, you have to create a new response.
self.addEventListener('fetch', function(event) {
  event.respondWith(
    fetch(event.request)
      .then(function(response){
          var stream = response.body();
          stream.readSomeBytes(); // This should be promise?
          return new Response(stream); // This constructor is not speced yet.
        })
  });

@horo-t I'm sorry for the misspelling and thank you for the detailed explanation!

@horo-t can you explain

From the security perspective, we want to distinguish the partially consumed response.

This seems unfortunate. What is the attack here?

I am also confused by

The partially consumed response should be treated as the response which is created in the ServiceWorker, not the response from the server.

What is the (user-facing) difference?

From the security perspective, we want to distinguish the partially consumed response.

This seems unfortunate. What is the attack here?

Sorry this security concerns was form my misunderstanding. It is solved in https://crbug.com/392409.
Please ignore my comments.

Based on Horo's reply (#452 (comment)), I've updated the matrix (#452 (comment)) by changing N to Y where it was marked N due to #452 (comment).

It seems not only cache.put() but new Response(res) and respondWith(res) can also accept partially consumed data. I marked them N based on Domenic's post #452 (comment). But now we can flip them?

Sorry. I didn't understand Anne's this comment (#452 (comment)) correctly. We don't have any Response constructor that takes a Response. Horo taught me. Fixed by removing new Response(res) column.

But we do have a Request constructor that takes a Request. That is what I pointed out.

Yes. Added another table for Request.

@domenic (and probably @annevk), is it critical to prevent text() on a dirty closed Body (I mean, #452 (comment))? Allowing it seems not a big problem to me and greatly simplifies the situation.

@yutakahirano what is a dirty closed body, sorry? Is that

The stream is "closed" or "errored" but has Content-Length > 0.

?

If so I think it makes text() more friendly, if it rejects in those cases. What would you propose text() do in those cases? Return an empty string?

@domenic yes, returning an empty string seems more consistent to me.

If we can return an empty string in that case, I would propose the following for Request.

  • All actions (in @tyoshino's table) will fail when the stream is locked to a reader.
  • All actions except read() will take an exclusive lock.
  • new Request and fetch() will set unusable flag on the Request.
  • All actions will fail when Body's unusable flag is set.
  • new Request and fetch() will fail when Body is modified (i.e. non empty data is read from it).

These rules are moderately simple and cover @tyoshino's table requirements except the "text() on a dirty closed body" case.

(Note: unusable flag introduced here need careful consideration. What happens when accessing the stream of a body whose unusable flag is set?)

@domenic Please also take a look at my post #452 (comment). It seems we no longer have to prohibit cache.put(_, res) and e.respondWith(res) on partially consumed res. Do you agree we can flip them to Y in the table?

Discussed unusable flag with @tyoshino.
We thought of two solutions: Both of them don't need an explicit flag.

  • Do not release the lock even when the stream is fully consumed or an read error occurs. From the outside, body.closed will never be resolved or rejected. This needs an additional lock acquiring mechanism in ReadableStream, because currently specified lock is auto-released when closed or errored.
  • Tee the readable stream, pass the created one to the method (such as new Request) and error the original one. Tee is being specified now.

Regarding this unusable flag stuff, I am afraid we might be losing sight of what is best for developers in favor of what is easier to implement.

@tyoshino +1 for flipping them, sounds good to me.

#606 is resolved. See #606 (comment).

Done #452 (comment). Flipped to Y.

I'd like to fill the cells with ?. Could you guys please post your opinion about whether clone()-ing partially consumed (either by text() or read()) should be allowed or not.

Cloning should be allowed as soon as possible. So when an operation is allowed in a situation, Cloning should be allowed as well. Thus my opinion is that all ? should be Y.
The result body should be marked as modified when the original one is modified.

In general I am in favor of leniency on allowing partially consumed bodies to be used wherever normal bodies are. So Y for me.

Thanks. Updated the table (allow clone() on modified/closed Request/Response) based on

Domenic, Yutaka,

So, finally, I'd like to confirm conclusion about text(), respondWith(), fetch(), etc. call on a "closed" Request/Response which was not empty initially.


IMO, it's strange if the result of text() call on a "closed" Response differs depending on the initial state of the Response. We should do either of the following:

  1. Regardless of the initial state (empty or non-empty), we allow text() on a Response with a "closed" stream.
  2. Have a flag in the Response indicating that the Response is "consumed", and disallow text() on a Response once the Response is "consumed".
  3. Define "consumed" concept in the Streams API, and using it, disallow text() on a Response with a "consumed" stream. Rather than remembering and using the knowledge of the initial state, we define a virtual termination character $, put it in the queue and have some operation to consume it to explicitly mark the stream "consumed". I.e. an originally empty stream can also be "consumed".

(3) looks overkilling to me. I don't have strong opinion between (1) and (2).

+1 for (1). Reading from a partially consumed body will be useful, and (3) is complex for users.

Why can't we just make text() always reject on "closed" response? E.g.

function text() {
  if (this.body.state === "closed") {
    return Promise.reject(new TypeError("stream is closed!"));
  }
  // otherwise, read the stream to the end...
}

In terms of an empty body, I think it should be represented as an empty String / ArrayBuffer (XHR does so). In terms of a fully consumed body, I don't see a strong reason to treat it differently from an empty body.

I don't have a specific example, but developers would want to distinguish an empty response from a communication error.

By the way, readInto() in ReadableByteStream may return 0. That means read() may return an empty ArrayBuffer and it should be ignored.

In terms of an empty body, I think it should be represented as an empty String / ArrayBuffer

Agreed!

In terms of a fully consumed body, I don't see a strong reason to treat it differently from an empty body.

Well, it kind of indicates that somebody already consumed the body, so I thought it would be good to error in that case.

But I realized that an empty string (with no chunks enqueued) is automatically closed. So there is no way to distinguish between "empty" and "read to the end", except I guess out-of-band information like Content-Length. So I see your point.

I don't have a specific example, but developers would want to distinguish an empty response from a communication error.

Agreed with this too. Yet again JavaScript's paucity of error types strikes us :(.


In combination, I end up agreeing with you. So, @tyoshino's 1 from #452 (comment) sounds good to me.

Thanks, Yutaka, Domenic. The table (#452 (comment)) is already updated to reflect the agreement.


Now, all the columns (except for that read() is not allowed for a closed stream) are the same.

After making a little fix on Yutaka's algorithm in #452 (comment), either of the following algorithms should work:

(A) Lock + body passed flag

Request

  • a Request has a boolean body passed flag
  • req.bodyUsed = (req.body is locked) || req's body passed flag is set
  • The following operations fail when req.bodyUsed is set. Otherwise, they acquire the lock of req.body and release it when done.
    • req.arrayBuffer()
    • req.blob()
    • req.formData()
    • req.json()
    • req.text()
  • req.clone() fail when req.bodyUsed is set.
  • The following operations fail when req.bodyUsed is set. Otherwise, they set req's body passed flag, confiscate the underlying source and queue from req.body and error it.
    • fetch(req)
    • new Request(req)
    • cache.put(req, res)

Response

  • a Response has a boolean body passed flag
  • res.bodyUsed = (res.body is locked) || res's body passed flag is set
  • The following operations fail when res.bodyUsed is set. Otherwise, they acquire the lock of res.body and release it when done.
    • res.arrayBuffer()
    • res.blob()
    • res.formData()
    • res.json()
    • res.text()
  • res.clone() fail when res.bodyUsed is set.
  • The following operations fail when res.bodyUsed is set. Otherwise, they set res's body passed flag, confiscate the underlying source and queue from res.body and error it.
    • e.respondWith(res)
    • cache.put(req, res)

Note

  • a Response/Request with body passed flag is unset but .body "errored" is considered to be one whose headers were received successfully but body wasn't
  • a Response/Request with body passed flag set is considered to be invalid

(B) Permanent lock

Request

  • req.bodyUsed = req.body is locked
  • The following operations fail when req.bodyUsed is set. Otherwise, they acquire the lock of req.body and release it when done.
    • req.arrayBuffer()
    • req.blob()
    • req.formData()
    • req.json()
    • req.text()
  • req.clone() fail when req.bodyUsed is set.
  • The following operations fail when req.bodyUsed is set. Otherwise, they acquire the lock of req.body and never release it.
    • fetch(req)
    • new Request(req)
    • cache.put(req, res)

Response

  • res.bodyUsed = res.body is locked
  • The following operations fail when res.bodyUsed is set. Otherwise, they acquire the lock of res.body and release it when done.
    • res.arrayBuffer()
    • res.blob()
    • res.formData()
    • res.json()
    • res.text()
  • res.clone() fail when res.bodyUsed is set.
  • The following operations fail when res.bodyUsed is set. Otherwise, they acquire the lock of res.body and never release it.
    • e.respondWith(res)
    • cache.put(req, res)

Note

  • Needs change on the Streams API spec to hold the lock of a stream even after draining it.

(A) looks pretty good to me. (B) is not too bad---I could add getReader({ preventReleaseOnClose: true })---but it does feel like it is overloading "locked" to mean too many things.

In terms of user-observable consequences, I think (A) allows you to check req.body.state === "closed" after fetch(req) finishes consuming it, whereas (B) causes it to throw.

In terms of user-observable consequences, I think (A) allows you to check req.body.state === "closed" after fetch(req) finishes consuming it, whereas (B) causes it to throw.

In my draft (#452 (comment)), req.body.state stays to be "errored" even after fetch(req) finishes. Do you suggest that we should change it from "errored" to "closed" once fetch(req) finishes?

Oh, sorry, I didn't read them carefully enough. Now I am confused though. Why don't those operations acquire a lock and read to the end? They can still set a flag (maybe "body neutered" is not the right name) that impacts bodyUsed.

For new Request(req)'s case, based on Anne's comment (#452 (comment)), I made it mark the old req unavailable permanently. To represent that concept also on req.body, I made req.body "errored" in (A) of #452 (comment). I thought it would look more consistent if req.body is also made to something indicating unsable-ness (errored or locked (B)) as well as fetch(req), new Request(req) and cache.put(req, res) are prohibited.

I would think in all these cases conceptually what is happening is that the stream is being locked-then-consumed while it is copied over into the new request, or fed to the network, or put into the cache. "waiting" while being consumed, then "closed" after it's done being consumed, makes the most sense to me.