yutakahirano/fetch-with-streams

Uploading: how to handle redirects

Closed this issue ยท 41 comments

I see two options:

  • Do as the spec does now, and tee the stream so we can handle redirects. A bit complicated...
  • Only allow uploads if the redirect mode is "error".
    • Automatically set it to "error" if they supply a stream, or require them to opt-in?
    • Should we allow "manual" also?

In practice I think disallowing upload redirects is not that bad. Redirects for uploads seems pretty weird to me... We could also do that as an initial version and then if there is developer need, loosen the restriction later?

It's not just redirects. It's also authentication. You'd have to set window to null too. There's a note around the place where we tee streams that implementations could optimize for the no-window/no redirects scenario.

How complicated will teeing be by the way?

(It's annoying that if you want to reduce memory usage you have to pass redirect:"error" and window:null. Ideally we have something simpler for that. But perhaps requiring those to be passed when you pass in a stream is okay for a v1.)

Can you explain the window thing? I am not familiar with that part of fetch and what it impacts.

I think teeing is probably not that complicated from a spec perspective. But it might be complicated from an implementation perspective, or at least, complicated to optimize. @yutakahirano had more thoughts on this.

And yeah, I mainly wanted to see how people felt about making v1 a bit more annoying. If we can figure out a way to avoid that (e.g. teeing) that's even better.

It impacts HTTP authentication dialogs. A user agent can redo a request when given a 401/407 response. Setting window to null means there's no place to display the dialog and therefore the request cannot be redone.

Sorry for the late response.

  • Do as the spec does now, and tee the stream so we can handle redirects. A bit complicated...
  • Only allow uploads if the redirect mode is "error".
    • Automatically set it to "error" if they supply a stream, or require them to opt-in?
    • Should we allow "manual" also?

If it's possible to specify when the stream can be send-able without teeing, I would prefer the second option, at least for now. If users want to send a request with RS without such limitation, we can consider the other option at that time.

Do you think we can? E.g., what about 401/407?

Hmm, looks impossible.

Talked with @tyoshino.

  • Teeing all the data until the response arrives is bad for long (or endless) streams.
  • Authentication is crucial. Developers need to be able to upload a stream with authentication.
  • We need to provide a server with a way to respond with seeing all the request body data.

Can we propose a new HTTP status code (say 102) which means there is no more redirect and authentication (and more?)? IIUC 100 is not enough. Do you think it's a good idea? How hard is it to propose such a status code to IETF?

@mnot @reschke I believe you have experience with registering status codes. ๐Ÿ˜Š

The idea would be, if I understand correctly, that we don't stream the request body until the server guarantees with a 102 that the eventual response won't be a redirect or authentication dialog. If it doesn't send a 102 we'll just not transmit the body. If it does send a 102 and then doesn't comply it's a network error.

We might also offer the more expensive tee-all-the-things option, but the above would be much better as a default.

I think I would prefer to tee all the things, or disallow authentication. Causing a roundtrip tax on all from-JavaScript uploads just because a small minority of cases might need authentication seems like a bad tradeoff.

If the server transmits early there would not be much of a tax, if at all. And teeing is a pretty big tax.

Sorry if I missed this, but how does the server respond with anything before the upload body is sent?

Edit: Is this possible with chunked encoding?

The UA can stop teeing after it sees 102 even with the "tee-all-things option".

@wanderview I imagined something like 100 Continue. The server can send some information before sending the final response.

It's not that complicated to define a new status code (but 102 is taken, btw, see http://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml)

However, I'm both skeptical about whether a code is needed, and the likeliness of a server to get this right.

Is there a summary of what problem this would solve? How likely are large uploads that happen before any other request that would trigger redirects or auth?

Finally, how would the new status code be different from 100 (Continue)?

Having thought about this more over the last few days, I maintain that it's not really worth working on this feature if it requires server-side opt in. All the developers we've heard from want to use this uploading functionality on preexisting servers, in most cases outside their control.

So while we could consider minting a new status code (or reusing 100) as some kind of optimization for those that control both sides of the pipeline, I think the initial version of this feature really can't make that a requirement. So either it needs to tee or it needs to fail on redirects and on authenticated endpoints.

@reschke Thank you, then let's use 199 in this thread. The problem is that we need to copy the request body to resend it in case redirect / authentication is required until the response arrives. It leads to a memory bloat. IIUC 100 is not enough - the server can serve 307 after sending 100.

Even when the server doesn't trigger redirects, the client cannot know that until it receives the final response. If the server wants to see the request body to decide the response code (200 or 404, for example), the client needs to remember the entire body.

All the developers we've heard from want to use this uploading functionality on preexisting servers, in most cases outside their control.

This is not true. All the folks wanting to use this to replace their WebSocket stacks have significant control over both ends of the stack and have certainly made noise about wanting this.

Furthermore, we have tried to make the default API for fetch() to be the one that consumes the fewest resources. It would be nice if the least complicated API for uploading was also the most efficient one.

We could still ship with a less efficient model as a first step, but not until we know what the more efficient model would look like so we can preserve the aforementioned aspects of the API.

I tried to think about the 199 model a bit and I'm not entirely sure it helps. Ultimately the client needs to make a decision here about memory and latency.

I think a better model would be that the client requires 199-like semantics or explicitly decides to tee.

If the client requires 199-like semantics, the client can preserve memory usage and start sending the body directly (reducing latency; disregard my earlier comment on this). If the server ends up sending a 401/407/redirect, then we result in a network error. The client should have figured out somehow with the server beforehand that the 199-like semantics can be required. 407 (aka a porxy) makes this tough, but so be it.

API-wise I think it would make the most sense if the explicit teeing instruction was bound to the stream that is passed to fetch() in some way, but we could also add a flag to the fetch() API.

I'm not entirely sure how it should interact with redirect mode. I'm somewhat leaning towards ignoring it in the default non-teeing case and not requiring it to be set to "error" or "manual".

Let me summarize proposals as usual.

I'd define a new term here, "resource status", that is statuses that don't require any User-Agent level action such as authentication handling, redirect handling, etc. and therefore are delivered to the application layer. In #66 (comment), Yutaka said 100 is not good. That's because non resource status may follow 100. Only resource status follows 199.

UA could start sending body

  • (a) immediately following the header
    • (1) No tee. Error on redirect, etc.
    • (2) tee until the client receives non resource status.
    • (3) tee until the client receives 199. (#66 (comment))
  • (b) once the client receives non resource status. No tee.
  • (c) once the client receives 199. No tee.

I understand Julian's comment (#66 (comment)) but not sure if it's reliable that we issue preflight for completing auth given load balancer, etc. So, I'm feeling that we should have (a-2) (and (a-3)), not only (a-1).

(a-2) and (b) requires the server to decide what status code to return before receiving all the body data. If it's impossible, the server could first send 200 and deliver the actual status code later in application level by encoding the status into the body. This might be related to the HTTP trailer API discussion.

(a-3) and (c) requires a new status code and server's opt-in. Solves #66 (comment). The client needs to declare capability of 199 ("Expect: 199"?).

(b) and (c) makes latency and memory consumption of fetch() competitive to WebSocket protocol/API. (a-1) gives us better latency than WebSocket.

Initially I was feeling that the default behavior for streams should be (2) (and (3)) for consistency with other data types. But I don't have strong opinion now.

So, the options would be

  • ""
    • default
    • optimized as (a-2) (a-3)
  • "send-body-when-ready"
    • (b) and (c)
  • "send-body-without-tee"
    • (a-1)

199 support can be added later to enable (a-3) and (c). The client would then start sending "Expect: 199".

FWIW, we removed "Expect" from HTTP in RFC 7231 except for the already defined 100-continue.

Also, for the server to be able to send a new 1xx code, the client does not need to opt-in. If sending the 1xx breaks the client, the client needs to be fixed.

@reschke for the "send-body-when-ready" option you need client opt-in of some kind (otherwise the server does not know it needs to transmit an early reply), but maybe we could define a Fetch-specific header for that, since either a new 1xx or an early status code from the server would do.

What keeps the server from always sending the new 1xx code?

Anyway, this design with client opting asking for the code, and server opting in seems to be identical to the one that was tried before with 100 Continue and which didn't work well in practice -- mainly because the server stacks didn't allow the applications to properly do the 1xx handling (there was a long servlet API spec discussion about that).

The server always sending it would be redundant with the case where the client transmits without teeing or the server can transmit a response status early.

Anyway, it seems we have enough of an idea without a status code so we could consider that again later.

(Pointers to previous discussions welcome.)

What do you think about decoupling the discussion?

  1. Add a means to create a Request object with ReadableStream to the fetch spec. The request body will be teed until the response status is available ((a-2) in #66 (comment)).
  2. Talk about other options ("send-body-when-ready" (b), "send-body-without-tee" (a-1) in #66 (comment)) and how to give such options.
  3. Talk about 199 ((a-3) and (c) in #66 (comment)).

They can be done in parallel.

I think we should consider "send-body-without-tee" and "" (aka "send-body-with-tee") at the same time, as the former should have the simpler API due to it consuming less resources and therefore being preferable. We don't necessarily have to ship the simpler API first, but we should have it figured out. Everything else can wait.

Hmm, OK. I think it's good to choose "send-body-with-tee" as the default option, because it's the easiest one to use of three proposed options (Note: I changed my mind from #66 (comment) assuming we will be able to rely on some optimization such as 199 in the future).

I think if we do teeing by default we're not being consistent. Elsewhere we require clone() to be invoked to get duplication and an increase in memory usage. I think we should have something similar here. Or at least force it as a choice somehow.

I think if we do teeing by default we're not being consistent.

Hmm, can you explain? When we stream uploads that are not created by page authors, we tee, and get the duplication and increase in memory usage. If we're shooting for consistency, I think being consistent with this platform behavior is the most important factor.

As @yutakahirano says, that is the easiest to use. That is presumably why we automatically tee in those cases. I don't see why it should be different for author-created streams.

Whenever we had the option so far, we opted for better performance. I think we should continue to aim for that.

Could be interesting to do some telemetry to see how often teeing by default is wasted/used.

Have to agree

Sorry for the late response.
Thank you, then I am updating #66 (comment):

  1. Add a means to create a Request object with ReadableStream to the fetch spec. The fetch will be terminated when redirect or authentication are required ((a-1) in #66 (comment)).
  2. Talk about other options ("" (a-2) (but let's say "send-body-with-tee"), "send-body-when-ready" (b) in #66 (comment)) and how to give such options.
  3. Talk about 199 ((a-3) and (c) in #66 (comment)).

Notes for F2F discussion:

  • What happens if you get a 301 (etc.)?
  • What happens if you get a 401/407?

Possible solutions:

  • Error the upload on 3xx/4xx. Two sub-options:
    • Automatically opt-in to these semantics when body is a stream
    • Require that whenever you set body to a stream, you also set these options (redirect mode = error, authentication mode = error)
  • Tee the stream, send one branch up, wait until response arrives before deleting other branch
  • New status code (199)
    • Allows server to say "throw away the second tee branch", I think

F2F:

  • Only fail if fetch need to replay the stream & can't
  • You opt into this behaviour by providing a stream
  • If this fails, there's no programmatic way of detecting why (connection fail vs 307 received)

Also, we decided not to provide opt-in for automatic handling of redirects / authentication right now, and not do a new status code. We can do those as needed later on.

Also, we decided to allow redirects for 303, since that drops the body and changes the method to GET reliably. (Doing the same for 301/302 when the method is POST could also be done, but seems somewhat less clean.)

So the conclusion is that the first release should be (a-1) in #66 (comment), right?

@yutakahirano yes (with 303 as an exception since it has simple semantics).

Thanks.

@annevk

Doing the same for 301/302 when the method is POST could also be done, but seems somewhat less clean

Isn't the rule just "if the body is needed a second time, it fails"?

The problem is that what browsers do for 301/302 is somewhat against the spirit of HTTP. So further enshrining it here does not seem great.

This can now be closed.