apify/crawlee

HTTP client switching

B4nan opened this issue · 4 comments

Currently, we use got-scraping for making HTTP calls, but that is getting outdated as we speak. We need to find a way to allow using different HTTP clients, like curl-impersonate or axios.

The interface should mimic that of got-scraping for BC reasons (https://github.com/sindresorhus/got/blob/main/documentation/2-options.md#url), with features not supported by https://github.com/apify-projects/node-curl-impersonate/tree/master omitted. Index signatures will be used to keep compatibility with eccentric usage of got-scraping.

Shouldn't we just do this in a breaking change instead? Feels weird to try and make it still respect got-scrapings interface to me...

Shouldn't we just do this in a breaking change instead? Feels weird to try and make it still respect got-scrapings interface to me...

It doesn't seem like that much of a hassle. But if it is, I'm totally open to just breaking it. Wait for the PR 🙂

We will most likely do that later this year, but the support for doing this "somehow" should be done during Q3, so let's try to make it work now on runtime at the cost of not perfect typing, and improve in v4 where we can afford breaking stuff.