Domain Rate Limiting Frontier Check
Opened this issue · 0 comments
ikreymer commented
A way to limit how many crawlers are on a specific domain at a time.
Basic idea is something like d:<domain>
redis key that is incremented for each automation on that domain, and decremented when automation finishes (or exits). Key can also have auto-expiry just in case, and expiry is updated with each increment.
When checking frontier, if d:<domain>
is at max, that url is then requeued.