ArchiveTeam/terroroftinytown

Account network delays during sleep

Opened this issue · 0 comments

in tinyback it was a bit more precise and optimized (in my opinion):
there was a rate limit tuple, defined here: https://github.com/ArchiveTeam/tinyback/blob/master/tinyback/services.py#L48
implementation was there: https://github.com/ArchiveTeam/tinyback/blob/master/tinyback/__init__.py#L132
the thing is, if I take is.gd for example, you can scrape 60 url in 1 minute, so with terroroftinytown-client-grab, the delay will be implemented as 1s
now, think on a 1 day timeframe, with tinyback you could scrape 86,400 urls / day
with terroroftinytown-client-grab, you will call sleep(1) 86,400 times, but if you take into account the RTT for each url request, maybe you only scrape 80/85k url