webrecorder/autobrowser

Domain Rate Limiting Frontier Check

Opened this issue · 0 comments

A way to limit how many crawlers are on a specific domain at a time.
Basic idea is something like d:<domain> redis key that is incremented for each automation on that domain, and decremented when automation finishes (or exits). Key can also have auto-expiry just in case, and expiry is updated with each increment.
When checking frontier, if d:<domain> is at max, that url is then requeued.