Scrape TTL?
sherzberg opened this issue · 3 comments
sherzberg commented
Any thoughts on a feature where multiple calls with the same url would just return the file on disk? Setting a TTL parameter on the grab could determine how long the scrape/sketch/html would be cached.
sbehrens commented
That's a great idea, will add this to a feature request.
zmallen commented
This could be achieved with redis. Just set a TTL configuration option in https://github.com/Netflix/sketchy/blob/master/config-default.py , then for every URL scraped set that URL in redis as the key and the file path as the value.
Then, for every request, check redis for the existence of that URL through redis.get(url), if it returns None, scrape, if not, return the value from redis.
sbehrens commented
Closing. If someone wants to add this feature, please reopen.