Netflix-Skunkworks/sketchy

Scrape TTL?

sherzberg opened this issue · 3 comments

Any thoughts on a feature where multiple calls with the same url would just return the file on disk? Setting a TTL parameter on the grab could determine how long the scrape/sketch/html would be cached.

That's a great idea, will add this to a feature request.

This could be achieved with redis. Just set a TTL configuration option in https://github.com/Netflix/sketchy/blob/master/config-default.py , then for every URL scraped set that URL in redis as the key and the file path as the value.

Then, for every request, check redis for the existence of that URL through redis.get(url), if it returns None, scrape, if not, return the value from redis.

Closing. If someone wants to add this feature, please reopen.