[FEATURE REQ] Rate limit requests to the same host
Opened this issue · 4 comments
Is your feature request related to a problem? Please describe.
I've got a README.md
that has two links to a hackernews post.
Since a couple of days, linkspector fails with a 503 on the 2nd URL in the file, independently of the order. Linkspector succeeds when either of the URL's is commented out.
It therefore seems that the server is responding witht 503 because linkspector fires the request to quickly, or something like that.
Describe the solution you'd like
Perhaps linkspector could wait a (configurable) amount of time between requests to the same host.
Everything works fine if I adjust the batchSize
from 100 to 1 in checkHyperlinks()
.
linkspector/lib/batch-check-links.js
Line 49 in 91e6093
Keeping the batchSize at 100 and adding a random delay of up to 10 seconds à la https://stackoverflow.com/a/45010143/2097 also works:
// returns a promise that resolves after the specified number of ms
function delay(ms) {
return new Promise(resolve => {
setTimeout(resolve, ms);
});
}
async function checkHyperlinks(nodes, options = {}, filePath) {
const { batchSize = 100, retryCount = 3, aliveStatusCodes } = options
...
for (let i = 0; i < tempArray.length; i += batchSize) {
const batch = tempArray.slice(i, i + batchSize)
const promises = batch.map(async (link) => {
await delay(Math.random() * 10000); // wait
const page = await browser.newPage()
but 1 second is not enough.
Or maybe URL's to the same host should always be in separate batches?
Example README.md
, since I've updated mine to have only a single hackernews link in there:
## Quotes
These quotes highlight the goal and status of this repository.
[kachnuv_ocasek](https://news.ycombinator.com/item?id=36354589)
[arghwhat](https://news.ycombinator.com/item?id=36354464)