tkem/cachetools

Ignore KeyError in cache

Closed this issue · 3 comments

Although cachetool is not thread safe, I find a key point is that cachetools will raise keyerror when it runs at del or pop operations if a key is just expired by other threads. Since these key-values are no longer cached, a more proper way is to ignore keyerror. I think it would be a solution to make cachetools more stable in multithreaded code, in consideration of the performance decrease with lock.

Here are some examples for modification

class Cache(collections.abc.MutableMapping):
    def __delitem__(self, key):
        size = self.__size.pop(key)
        #del self.__data[key]
        #self.__currsize -= size
        try:
            del self.__data[key]
        except KeyError:
            pass
        else:
            self.__currsize -= size

class LRUCache(Cache):
    def __delitem__(self, key, cache_delitem=Cache.__delitem__):
        cache_delitem(self, key)
        #del self.__order[key]
        self.__order.pop(key, None)

class TTLCache(Cache):
    def __delitem__(self, key, cache_delitem=Cache.__delitem__):
        cache_delitem(self, key)
        #link = self.__links.pop(key)
        #link.unlink()
        #if link.expire < self.__timer():
        #    raise KeyError(key)
        try:
            link = self.__links.pop(key)
        except KeyError:
            pass
        else:
            link.unlink()
            if link.expire < self.__timer():
                pass
tkem commented

@onebula: I think I see your point, however I always thought that those KeyErrors that might show up when using caches without proper locking were more of a feature, reminding people that what they're doing was basically not thread-safe...

IMHO the goal shouldn't be to "make cachetools appear more stable in multithreaded code" (italics by me), but to enforce use of proper locking for concurrent access. So for multithreaded programs, I think this might just give a false sense of security.

However, since it's somewhat natural for a cache that items just "vanish" at some point, I feel that not raising KeyError from del does have a certain charm. However, this would be a breaking change I'm not sure is worth introducing at this point.

As many users found, multithread is a common case that cachetools fails. Maybe other cases in single thread code could lead to the same situation.
The major reason is that the delitem function could break due to KeyError:

  1. As shown in examples, replacing del key by cache.pop(key, None) or just to ignore KeyError in delitem related functions, while the other major function of cachetools still keeps.
  2. I suggest using a global configuration to handle this situation, and printing warning to stdout or log instead of stop service.
    This could be more natural for cache. Other users may also benefit from this modification.
tkem commented

As stated in the docs:

Please be aware that all these classes are not thread-safe. Access to a shared cache from multiple threads must be properly synchronized, e.g. by using one of the memoizing decorators with a suitable lock object.

"Not thread-safe" means that a cache's internal data structure may become corrupt when accessed concurrently without proper synchronization, leading to memory leaks and other nasty stuff. Raising KeyError is only an indication that far worse things may have already happened without you noticing. So eliminating these is not an option.