swift-server/async-http-client

Error when using HTTP2 for AWS S3 request

pballart opened this issue ยท 20 comments

Hello! I found an issue while using Vapor 4 and the Soto library to upload and delete objects from a DigitalOcean S3-compatible bucket. After some debugging you can follow in soto-project/soto#608 we arrived at the conclusion that the issue must be in the AsyncHttpClient library.

You can follow the discussion in the linked issue but to sum up:

  • Using HTTP2 the delete object request fails with NotImplemented error meaning there is some unknown header according to the docs.
  • Using HTTP1 the request works fine. Also using HTTP2 and proxying it through Charles and even manually doing the cURL it also works fine.

Given that you have a Charles proxy in the way, can you please print the headers that Charles observes in the HTTP/2 case, and compare that to what it prints for Curl, also running through the proxy?

Oh, sorry, I see that this doesn't reproduce if you use HTTP/2 through Charles.

Can you produce a self-contained sample that I can use to reproduce this issue?

I am not sure how to produce a self-contained sample...
On one side we need a DigitalOcean Storage bucket and on the other side a Vapor project using Soto library that uses AsyncHTTPClient. Is there another way? I saw there are some open issues related to HTTP2, maybe it's something related.

I can set up a DigitalOcean Storage bucket for testing purposes, so what I really want is a Soto program that will reproduce the issue against a DO Storage bucket. I can go from there.

Perfect, I will prepare that ๐Ÿ‘
Thanks

friendly ping @pballart

Hello! Sorry I forgot about this one ๐Ÿ˜“
Here you have the sample project: https://github.com/pballart/soto-s3-digitalocean
You just need to configure the access to the bucket and try the deleteImage endpoint.

Thanks for providing the reproducer!

This issue is really on Digital Ocean's end, and they should fix it. The problem is that they don't like the HTTP/2 frame sequence we're sending. Specifically, we're producing this request:

Request:
  DeleteObject
  DELETE https://fra1.digitaloceanspaces.com/<BUCKET_NAME>/image.jpg?x-id=DeleteObject
  Headers: [
    user-agent : Soto/6.0
    content-type : application/octet-stream
  ]
  Body: empty

We transform this into the following sequence of HTTP/2 frames (formatted in the style of RFC 9113):

HEADERS
  - END_STREAM
  + END_HEADERS
  :path = /<BUCKET_NAME>/image.jpg?x-id=DeleteObject
  :method = DELETE
  :scheme = https
  :authority = fra1.digitaloceanspaces.com
  user-agent = Soto/6.0
  content-type = application/octet-stream
  x-amz-date = 20220805T155458Z
  x-amz-content-sha256 = e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
  authorization = <REDACTED>

DATA
  + END_STREAM
  {zero length, no data}

That is, we send our HTTP/2 HEADERS frame without END_STREAM, and then send a zero-length DATA frame with END_STREAM. This produces a request with zero length.

Digital Ocean is incorrectly rejecting this. They want us to drop the DATA frame and attach END_STREAM to the HEADERS frame instead. This is not compliant with the RFC: DATA frames are absolutely allowed to be zero length, and it is entirely fine to send the END_STREAM flag this way.

Working around this is extremely painful: it would force AHC to remove its reliance on the HTTP2FramePayloadToHTTP1ClientCodec, which is doing a lot of heavy lifting to transform between HTTP/1 and HTTP/2. Is there a sensible way for you to report this as a bug to Digital Ocean?

Thanks for the details. I opened a ticket in their support center with a link to this discussion.
I'll keep you posted ๐Ÿ‘

Hi! One of the DigitalOcean Spaces engineers here.

Thanks for the reproduction case in https://github.com/pballart/soto-s3-digitalocean.
It was really helpful.

The HTTP/2 stack in use is almost-stock Envoy Proxy. But the problem is a few layers deeper and involves the intersection: we use Ceph for S3 (this has been published and we've given talks about it).

Envoy converts HTTP/2 to transfer-encoding chunked for talking to HTTP/1.1 systems.
The empty DATA frame is converted to a similar empty chunk.

With that in mind, I produced a HTTP/1.1 version of the bug:

curl 'https://REDACTED.nyc3.digitaloceanspaces.com/empty' -X DELETE -H 'x-amz-date: 20220818T235940Z' -H 'Authorization: AWS4-HMAC-SHA256 Credential=REDACTED/20220818/nyc3/s3/aws4_request,SignedHeaders=host;x-amz-content-sha256;x-amz-date,Signature=REDACTED' -H 'x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'  -v -H "Transfer-Encoding: chunked" --http1.1

AWS gets it correct, Ceph gets it slightly wrong :-(.

This behavior has the potential to affect S3 operations that have no payloads. However, some of them DO work...

On the plus side, fixing Ceph looks like it would be easy, just a few operations are missing here:
https://github.com/ceph/ceph/blob/main/src/rgw/rgw_rest_s3.cc#L5639-L5691

As a workaround for now, can you send Content-Length: 0 for these operations in your Soto system? (I'm not familiar with Swift, so I don't know how hard that would be).

As a workaround for now, can you send Content-Length: 0 for these operations in your Soto system? (I'm not familiar with Swift, so I don't know how hard that would be).

@adam-fowler would this be excessively hard? It's a great fix: swift-nio will automatically behave better in essentially all environments if we do this.

This would be easy enough. Also @pballart could write a Soto middleware that adds the header in.

What would be the advantage of writing a middleware instead of adding it by default for all cases? Or is there a case where adding this is not encouraged?
I can definitely add the middleware, just thinking about the best solution here...
Happy to hear your thoughts ๐Ÿ˜„

What would be the advantage of writing a middleware instead of adding it by default for all cases? Or is there a case where adding this is not encouraged?
I can definitely add the middleware, just thinking about the best solution here...
Happy to hear your thoughts ๐Ÿ˜„

I only suggested the middleware as it appears the issue will be resolved on Digital Ocean's side and any solution on the client side would only be needed temporarily.

That makes sense. I'll try the middleware and let you know ๐Ÿ‘

Me again ๐Ÿ˜“
I added the middleware but it's still not working:
https://github.com/pballart/soto-s3-digitalocean/blob/content-length-header/Sources/App/ContentLengthHeaderMiddleware.swift

You can check it in the content-length-header branch.

We always remove Content-Length headers in AHC:

self.remove(name: "Content-Length")

You will instead need to set the body to .empty and AHC will set Content-Length: 0 for you. However, we do not set a Content-Length for the methods .GET, .HEAD, .DELETE, .CONNECT or .TRACE as defined in the RFC :

A user agent SHOULD NOT send a Content-Length header field when the request message does not contain a payload body and the method semantics do not anticipate such a body.
โ€” https://www.rfc-editor.org/rfc/rfc7230#section-3.3.2

case .known(0):
// if we don't have a body we might not need to send the Content-Length field
// https://tools.ietf.org/html/rfc7230#section-3.3.2
switch method {
case .GET, .HEAD, .DELETE, .CONNECT, .TRACE:
// A user agent SHOULD NOT send a Content-Length header field when the request
// message does not contain a payload body and the method semantics do not
// anticipate such a body.
break
default:
// A user agent SHOULD send a Content-Length in a request message when
// no Transfer-Encoding is sent and the request method defines a meaning
// for an enclosed payload body.
self.add(name: "Content-Length", value: "0")
}

There is currently no way to force AHC to send a Content-Length header.

This is a time when I begin to wonder whether we need to start having a Quicks configuration option. Something that lets us tweak our behaviour so that users can override our choices in circumstances like this, when we're technically correct but failing to interoperate with an existing broken implementation.

Hi,

Just wanted to leave an update here. There turns out to be some gotchas in the "easy" server-side fix I was proposing:
ceph/ceph#47773

That need other changes to ensure security still functions as expected. So it's not going to be as quick as I'd hoped.

I'm using Soto 7 and AHC 1.21.2, making requests to DigitalOcean Spaces, and some requests (like PutObject) work, but I had to set the configuration to .http1Only for other requests (like GetObjectAcl) to work. Iโ€™ve asked DO if they support HTTP/2 or not, awaiting an answer.