[request] TimePause between send request ro site.
Closed this issue · 2 comments
Some sites have a limit on the number of requests for a period of time or do not withstand constant requests. I suppose we need an option to limit request retry. MaxConcurrency limits only the number of simultaneous requests.
I haven't extensively tested this yet, but right now it works like this:
A crawl-delay is set automatically if it's defined in "robots.txt" file on the website. You can't set it manually yet. I could probably implement this.
If a request to website fails with following HTTP Codes: BadGateway
, TooManyRequests
or InternalServerError
, it will be retried after 5 minutes and if it fails again, that time is doubled.
If multiple requests fail for certain domain, the entire domain is blocked for some time before retrying.
I will probably add an option to define a manual crawl-delay in seconds.
I have now fixed the version issues and implemented a global crawl delay option in config.json
.