Long-lived cache management
softloft38p-michael opened this issue · 2 comments
I'm working on a project wherein I have two caches: one for celery tasks and one for a file index per root path. My question is how best to set these two caches up in a way that individual tasks and indexes have a limited lifetime, but the cache system itself is indefinite.
Currently I have this:
import diskcache
cache_root = '/path_to_caches'
TASK_CACHE = diskcache.FanoutCache(cache_root + '/task_cache', shards = 16)
INDEX_CACHE = diskcache.FanoutCache(cache_root + '/index_cache', shards = 16)
def get_task_cache(task_id: str):
task_cache = TASK_CACHE.cache(task_id, expire=259_200)
task_cache.touch()
return task_cache
def get_index_cache(root_path: str):
index_cache = TASK_CACHE.cache(root_path, expire=7_776_000)
index_cache.touch()
return index_cache
An individual task_cache or index_cache is read and written to by multiple celery tasks at the same time.
Some questions I have are:
- Is the above a reasonable way to ensure that a task is cleaned up at most 72 hours after last use and similarly for the index?
- Is there a better way to structure this so an individual task or index gets its own FanoutCache? It would be nice to ensure a corrupt index does not destroy other indexes.
- What is the recommended to clean an individual
.cache
fromFanoutCache
? Is something like this sufficient:Or is there a single-function counterpart totask_cache = TASK_CACHE.pop(task_id) task_cache.close()
.cache
?
Is the above a reasonable way to ensure that a task is cleaned up at most 72 hours after last use and similarly for the index?
Not really. Looks strange to me. This’ll just pollute your file system with task and index caches. The expire keyword is for the individual key-value items in the cache, not for the cache overall. Also, I don’t think touch() works that way. You have to touch a key. You don’t touch a cache.
Is there a better way to structure this so an individual task or index gets its own FanoutCache? It would be nice to ensure a corrupt index does not destroy other indexes.
Not really. The expectation is that they would all share a single fanout cache. If you create individual ones, you’ll have to delete them yourself.
What is the recommended to clean an individual .cache from FanoutCache?
Fanout cache doesn’t store caches and that’s confusing. That method is simply an easy way to create a cache in a subdirectory. There’s no cache management functionality between the parent/child.
Thanks for the reply!
Not really. Looks strange to me. This’ll just pollute your file system with task and index caches. The expire keyword is for the individual key-value items in the cache, not for the cache overall. Also, I don’t think touch() works that way. You have to touch a key. You don’t touch a cache.
It seems I can make it work by switching from .cache
to a key with an Index
storing all the same data: is that a better approach?