tkem/cachetools

Immutability of the cached value

Closed this issue · 3 comments

At the moment, if I mutate the result of the function being cached, the next time I call the function I receive the value after mutation. Is this the correct and expected behaviour? I'm not sure if it's semantically correct in all the cases. If the result of the function is being created inside the function, I'd expect that mutating the result outside won't affect any future calls and will always receive new instance of (possibly mutable) type. On the other hand there can be cases when the function just returns some global object and then it might be just fine that mutations affect the object's state.

So, wouldn't it be a good idea to add a parameter to cache classes/functions that would specify if we should return a copy (or deep copy) of the cached value instead of the actual value?

While in theory returning a deep copy should return equivalent objects, it will break for singletons:

>>> import copy 
... import cachetools.func 
...  
... SENTINEL = object() 
...  
...  
... @cachetools.func.lru_cache 
... def my_function(value): 
...     return copy.deepcopy(value) 
...  
...  
... return_value = my_function(SENTINEL) 
... assert return_value is SENTINEL, "values are different"                                                                                                                       
Traceback (most recent call last):
  File "<stdin>", line 13, in <module>
AssertionError: values are different

values are different

This is a nasty surprise and will break any code that relies on object identity. Shallow copies won't work as intended for nested dictionaries or lists. I'm sure there's a way around it, somehow.

tkem commented

I was very much aware of this mutability issue, especially with respect to getsizeof(), i.e. mutating a cached value may affect its size as reported by Cache.getsizeof(value), but the containing cache will not notice. However, this rarely seems to be an issue in practice, and apparently cache item mutability is also acceptable for the standard library's @functools.lru_cache. So for the sake of simplicity and efficiency, no copy is being made. I also don't see much use for an extra parameter - users of a cache can always make copies themselves, and probably know better how to make and use a copy of a cached item. However, these kinds of mutability issues should probably be mentionaed in the docs if they aren't already.

That makes sense. A short notice in docs would be nice, though.