How to serialize/unserialize the cache in/from a file?
deeplook opened this issue · 4 comments
Having the cache info is very useful, but I'm missing an entry to the cache itself so I could serialize it and reuse it later. Is there any way to do that already?
>>> f.cache_info()
CacheInfo(hits=8207, misses=1957, current_size=1957, max_size=None,
algorithm=<CachingAlgorithmFlag.LRU: 2>, ttl=None,
thread_safe=True, order_independent=False, use_custom_key=False)
I'm also hoping for some sort of pickling feature in the future!
I may be able to contribute to the project by implementing this feature if the maintainers have the will of answering some possible questions I may have during development.
After careful consideration, I'm sorry that a serialization/deserialization feature can not yet be implemented in this library. Although it would be a really nice and useful feature, it seems that the cons still outweigh the pros, and some challenges must be addressed.
- Deserialization is an unsafe operation which caused a large number of vulnerabilities in many programming languages (for example, Python's
pickle
is unsafe). So, it must be designed very carefully, especially in a library which is depended by a lot of software. - It is difficult (or even impossible) to consistently maintain the serialized cache when the code changes.
For example, given a foo
@cached
def foo(x):
return x
One day, we serialize the cache by foo.serialize(...)
. After several revisions, we changed the logic of foo
:
@cached
def foo(x):
return x + 1
If we deserialize the cache by foo.deserialize(...)
, what will happen is that we will get wrong results (we get x
instead of x + 1
when we call foo(x)
).
If anyone:
- finds a way to keep the serialized cache consistent with the code (for example, raise an error if the code has been changed)
- has the capability and resources to implement a serialization feature free from vulnerabilities
Please comment or submit PRs. Thank you!
@lonelyenvoy Have you looked at how joblib.Memory
approaches this? https://github.com/joblib/joblib/blob/55d97abd59dbc703579307f9d359870be436ebd1/joblib/memory.py#L672
Joblib also does a better job of dealing with the input args. It takes a sensibly filtered set of the arguments, assembles them into a list, pickle.dumps
the list, hashes the bytes and uses this as the key.
Sorry, how do you pickle the cache? I get that you might not want to have serde code in the library, but folks should be able to do that themselves via exposed apis to get and set the cache.