grantjenks/python-diskcache

Long-lived cache management

softloft38p-michael opened this issue · 2 comments

I'm working on a project wherein I have two caches: one for celery tasks and one for a file index per root path. My question is how best to set these two caches up in a way that individual tasks and indexes have a limited lifetime, but the cache system itself is indefinite.

Currently I have this:

import diskcache

cache_root = '/path_to_caches'
TASK_CACHE = diskcache.FanoutCache(cache_root + '/task_cache', shards = 16)
INDEX_CACHE = diskcache.FanoutCache(cache_root + '/index_cache', shards = 16)

def get_task_cache(task_id: str):
    task_cache = TASK_CACHE.cache(task_id, expire=259_200)
    task_cache.touch()
    return task_cache

def get_index_cache(root_path: str):
    index_cache = TASK_CACHE.cache(root_path, expire=7_776_000)
    index_cache.touch()
    return index_cache

An individual task_cache or index_cache is read and written to by multiple celery tasks at the same time.

Some questions I have are:

  • Is the above a reasonable way to ensure that a task is cleaned up at most 72 hours after last use and similarly for the index?
  • Is there a better way to structure this so an individual task or index gets its own FanoutCache? It would be nice to ensure a corrupt index does not destroy other indexes.
  • What is the recommended to clean an individual .cache from FanoutCache? Is something like this sufficient:
    task_cache = TASK_CACHE.pop(task_id)
    task_cache.close()
    Or is there a single-function counterpart to .cache?

Is the above a reasonable way to ensure that a task is cleaned up at most 72 hours after last use and similarly for the index?

Not really. Looks strange to me. This’ll just pollute your file system with task and index caches. The expire keyword is for the individual key-value items in the cache, not for the cache overall. Also, I don’t think touch() works that way. You have to touch a key. You don’t touch a cache.

Is there a better way to structure this so an individual task or index gets its own FanoutCache? It would be nice to ensure a corrupt index does not destroy other indexes.

Not really. The expectation is that they would all share a single fanout cache. If you create individual ones, you’ll have to delete them yourself.

What is the recommended to clean an individual .cache from FanoutCache?

Fanout cache doesn’t store caches and that’s confusing. That method is simply an easy way to create a cache in a subdirectory. There’s no cache management functionality between the parent/child.

Thanks for the reply!

Not really. Looks strange to me. This’ll just pollute your file system with task and index caches. The expire keyword is for the individual key-value items in the cache, not for the cache overall. Also, I don’t think touch() works that way. You have to touch a key. You don’t touch a cache.

It seems I can make it work by switching from .cache to a key with an Index storing all the same data: is that a better approach?