zkat/make-fetch-happen

can't fetch url which will redirect to another url with different protocol

dead-horse opened this issue · 13 comments

const fetch = require('make-fetch-happen');
fetch('http://registry.npm.taobao.org/istanbul-api/download/istanbul-api-1.1.8.tgz').catch(err => console.log(err.stack));

=>

Error: Protocol "https:" not supported. Expected "http:"
    at new ClientRequest (_http_client.js:54:11)
    at Object.exports.request (http.js:31:10)
    at exports.request (https.js:199:15)
    at fetch.Promise (/Users/deadhorse/git/github.com/eggjs/egg/node_modules/._node-fetch-npm@2.0.0@node-fetch-npm/src/index.js:54:17)
    at fetch (/Users/deadhorse/git/github.com/eggjs/egg/node_modules/._node-fetch-npm@2.0.0@node-fetch-npm/src/index.js:41:10)
    at ClientRequest.req.on.res (/Users/deadhorse/git/github.com/eggjs/egg/node_modules/._node-fetch-npm@2.0.0@node-fetch-npm/src/index.js:114:17)
    at emitOne (events.js:96:13)
    at ClientRequest.emit (events.js:188:7)
    at HTTPParser.parserOnIncomingClient (_http_client.js:473:21)
    at HTTPParser.parserOnHeadersComplete (_http_common.js:99:23)
curl -I http://registry.npm.taobao.org/istanbul-api/download/istanbul-api-1.1.8.tgz

HTTP/1.1 302 Found
Server: nginx/1.4.6 (Ubuntu)
Date: Wed, 10 May 2017 05:11:05 GMT
Connection: keep-alive
X-Current-Requests: 1
Location: https://cdn.npm.taobao.org/istanbul-api/-/istanbul-api-1.1.8.tgz
X-ReadTime: 4

node-fetch will use the same agent when request the tow different protocol url.

maybe node-fetch should accept tow different options like httpAgent and httpsAgent to support redirect with different protocol.

zkat commented

ahahah I was literally about to post this issue myself. I wonder why I didn't get notified. The solution to this is for make-fetch-happen to do its own redirects, so we can manage proxies and agents as redirects fly through (and cache them?).

In fact, we're gonna have to do that eventually anyway because it's required for supporting cookies.

Should we just hook follow-redirects here?

zkat commented

@hemanth that's a bit too low-level for our needs here. Aside from needing to switch between http and https, we need to be able to switch agents on several other parameters, including potentially switching proxies along the redirect route. Basically, we need to call getAgent() between each redirect to set a new agent for the specific target.

This would all likely be done automatically by simply having recursive calls to fetch() itself, but yeah.

This would all likely be done automatically by simply having recursive calls to fetch() itself, but yeah.

Phew!

I would be interested in tackling this if it is still in need of help!

zkat commented

@nlkluth all yours! Thanks a bunch for the help!

Made a crude fix to the first comment, but I could use help on the overall approach. "redirects" and "proxies" and "fly-throughs" are a bit above my head 🤣

nlkluth@7c851bb

zkat commented

@nlkluth the solution is a bit more involved than this:

  1. recursive fetch calls need to go back to the toplevel fetch. Redirects are cacheable, so the whole cache logic should be followed.
  2. node-fetch-npm should always receive opts.redirect=manual, but if we received manual, we need to skip all redirect logic on our side
  3. recursive calls to fetch() during redirects have the use the defaulted fetch that the user originally called, not the fetch we have in module.exports
  4. we need to replicate all the redirect logic currently in node-fetch-npm, which includes erasing the Authenticate header in the right situations
  5. you need to figure out what the right thing to do is when it comes to opts.agent. One thing we could possibly do is to erase the agent on redirect only if the user hasn't provided a custom opts.agent value. That might be the best thing to do here -- if they do that, it's on them to make sure their agent can switch protocols. If we did not receive a custom opts.agent, though, we need to clear agent and make sure getAgent() runs on the recursive call and figures out the right agent for the redirect location.

Awesome, thanks for the guidance!

I could use a little more clarification. On 1 and 3, I'm not sure exactly what is meant by redirects using the defaulted top-level fetch. I do see that the user will create a defaultedFetch, but I'm not sure how to approach using that.

I didn't see an agent as mentioned in point 5, but I assume that's because I'm not redirecting properly

Here is what I have so far, but I am using the fetch from module.exports

https://github.com/nlkluth/make-fetch-happen/blob/latest/index.js#L363

Thanks in advance!

zkat commented

@nlkluth As far as 1 goes, what I meant there is that you should use cachingFetch, which you're using now!

in retrospect about 3, I guess it's enough to just make sure you thread opts through, since defaultedFetch is just a collection of options.

As far as 5 goes, see https://github.com/nlkluth/make-fetch-happen/blob/latest/agent.js#L28-L30. I guess it'll probably Just Work™ 🤔 we don't set the agent in opts after all. Ok! That's probably easier than I thought!

zkat commented

Oh! One more note: I am not the original author of node-fetch-npm (it's a fork of node-fetch, and make-fetch-happens has a more permissive license than node-fetch does. I'm not a lawyer so I don't know how much the code needs to change or if there's other requirements, but we should definitely not be copy-pasting any source code directly from node-fetch-npm. I think at a minimum, you should rewrite it as a reasonable equivalent (we're working off a spec, so there's ultimately only one way to do it).