Automattic/knox

putFile/putStream API should handle HTTP 307 redirects from AWS S3

shafdog opened this issue · 31 comments

I totally get the "low level" knox API should leave this kind of handling to the client. But I wanted to suggest that the "high level" knox S3 API handle the cases where AWS S3 returns a HTTP 307 status code.

The Amazon's S3 doc says:

If you create a bucket using <CreateBucketConfiguration>, applications that access your
bucket must be able to handle 307 redirects."

This happened to me when I used #53 a few minutes after creating a new bucket in AWS to use. Once DNS is sync'd with S3, the case is likely more rare.

Obviously there are "workarounds" for this, but wanted to bring it up since knox is working pretty well for me otherwise! Thanks!

Can you expand on how we might do this? It sounds like a good idea but I'm not sure exactly what it implies.

Using request might help with this.

Sorry I didn't comment sooner. I wrote this up for completeness, the library works great. I wanted to look at the code before commenting.

But did run into this case. It happened to me right after the bucket was created. My belief was that since AWS S3 relies on the bucket name as part of the DNS hostname, if DNS hasn't replicated, then AWS handles that case by using a 307 while replication is pending. I think it's also happen if there is a problem at AWS where a data center goes, and they need to redirect the S3 traffic.

I've used restler as a HTTP client in other code. Never heard of superagent, but looks like it client side code...

Found this article that speaks to the underlying case for the 307 HTTP redirects:

http://docs.amazonwebservices.com/AmazonS3/latest/dev/Redirects.html

This is important - I often run into this problem with new buckets. I think using the request library might be a good idea - not only for redirect support, but also for increased sourcecode readability.

@magwo pull request welcome; otherwise I'll try to move this up the priority list.

I think the big question here is whether the knox lib should respect the 307 permanently, or keep going to the original endpoint but following the redirect every time. What do you think?

Edit: Actually since 307 is temporary redirect, it should keep going. But what about 301 or 303 or the like? I'm fairly sure I've gotten that from S3 on several occasions, and the knox version at the time did not handle that gracefully. Does it currently support that?

I might look into implementing this. Switching to internally use the request library might be an idea, but could also be a bit risky since the Node request object is exposed by the API. Maybe if the request library is 100% API compatible with Node requests, this will not be a breaking change. Not sure about the details of request.

I think the big question here is whether the knox lib should respect the 307 permanently, or keep going to the original endpoint but following the redirect every time. What do you think?

Even in the case of 301, I don't think it's the Knox client's responsibility to maintain a table of redirections.

I think AWS's PermanentRedirect results if you use a bucket name but get the region of the bucket wrong. Didn't try that and haven't seen them however.

Amazon docs are silent on when you'd get a PermanentRedirect (HTTP 301). See http://docs.amazonwebservices.com/AmazonS3/2006-03-01/API/ErrorResponses.html to find out its vague.

But if you create buckets on the fly, it's the 307 that you'll run into.

Agree that it's not knox's responsibility to keep track of which URLs redirect where. However, the API user will probably want to know if there was a permanent redirect that was automatically followed. This should not be done silently. Are there any proper ways to do this in the current API? There is no injected logging etc AFAICS.

If it should not be done silently, then I don't think it should be done at all.

+1 on this. I get 307 occasionally, please tell me if I can help in any way?

I have this issue as well with 307 redirect statusCode after upgrading to 0.8.0.

Before 0.8.0 requests were made to http://bucketname.s3.amazonaws.com:443, after 0.8.0 it looks like it's making requests against http://s3.amazonaws.com:443

Is this is cause of the issue?

@mpalmerlee this is great information; would love your help debugging. Could you confirm that the 0.7.x requests were to http://.s3.amazonaws.com:443 whereas the 0.8 ones were to http://s3.amazonaws.com:443? The second looks more correct than the first, so I'm confused.

I had put angle brackets around the word "bucketname" in my url above, and apparently github didn't escape them out for me so it wasn't showing them

I can confirm that pre-0.8 my requests would go to:
http://bucketname.s3.amazonaws.com:443

After I upgraded it used:
http://s3.amazonaws.com:443

And I saw the 307 responses.

I know this because in my unit tests I use the "knock" library to mock out my http requests and in 0.8 I saw the new url.

Hope that helps,
-Matt

Can you give the full URLs? Feel free to insert a line like console.log(req.url) near the end of Client.prototype.request in knox's lib/client.js.

In Client.prototype.request, it looks like the difference is in the options;
so I used this: console.log(options)

0.8.0:
{ hostname: 's3.amazonaws.com',
agent: false,
method: 'PUT',
path: '/obj2.json',
headers:
{ Expect: '100-continue',
'Content-Length': 49,
'Content-Type': 'text/plain; charset=UTF-8',
Date: 'Mon, 20 May 2013 03:23:53 GMT',
Host: 'mybucketname.s3.amazonaws.com',
Authorization: '...' },
proto: 'https',
port: 443,
host: 's3.amazonaws.com:443' }

0.7.1:
{ hostname: 'mybucketname.s3.amazonaws.com',
agent: false,
method: 'PUT',
path: '/obj2.json',
headers:
{ Expect: '100-continue',
'Content-Length': 49,
'Content-Type': 'text/plain; charset=UTF-8',
Date: 'Mon, 20 May 2013 03:25:56 GMT',
Host: 'mybucketname.s3.amazonaws.com',
Authorization: '...' },
proto: 'https',
port: 443,
host: 'mybucketname.s3.amazonaws.com:443' }

Fascinating! Now if only I could reproduce this locally... Thanks very much for that; as you can see something's clearly changed :(.

No problem! Good luck!

Can you try changing

  var options = { hostname: this.endpoint, agent: this.agent }

to

  var options = { hostname: this.host, agent: this.agent }

?

That seemed to fix it, at least for my tests!

Great, thank you!!! Will push out a 0.8.1 shortly.

Is this still unfixed? Does anyone want to put $ towards a bug bounty?

As a side note, we've recently stated porting over parts of our infrastructure to aws-sdk-js with excellent results so far.

@guille Do you have any thoughts on them bundling everything as one package? I like knox because it does only one thing, which goes in line with the node philosophy.

@LinusU Also as far as I can tell aws-sdk doesn't seem to support third-party (non-Amazon) providers whereas knox works fine with them.

please set the endpoint or region options when initialization client

Hey, just had this issue when setting up https://github.com/stephenyeargin/hubot-grafana as it uses knox. Was debugging quite some time why it didn't work in the first place until I found this issue report that advised me to just wait for the issue to resolve. Of course I would have liked to do something more productive the last 2 hours ;-). Any chance this gets solved ?

codec commented

@dirkaholic Same situation here. I couldn't figure it out and the issue seems unlikely to resolve any time soon. However, in my case it's just that knox doesn't support the signature algos used in eu-central-1. A bucket in us-standard works right away. See also #254