Scrape TTL?

Question

Scrape TTL?

sherzberg opened this issue 10 years ago · 3 comments

Any thoughts on a feature where multiple calls with the same url would just return the file on disk? Setting a TTL parameter on the grab could determine how long the scrape/sketch/html would be cached.

Answer 1 · 2014-09-09T19:56:49.000Z

That's a great idea, will add this to a feature request.

Answer 2 · 2014-10-23T15:21:54.000Z

This could be achieved with redis. Just set a TTL configuration option in https://github.com/Netflix/sketchy/blob/master/config-default.py , then for every URL scraped set that URL in redis as the key and the file path as the value.

Then, for every request, check redis for the existence of that URL through redis.get(url), if it returns None, scrape, if not, return the value from redis.

Answer 3 · 2015-03-25T16:37:53.000Z

Closing. If someone wants to add this feature, please reopen.