When two @cachedmethod()'s are called with the same arguments, results are comingled

Question

When two @cachedmethod()'s are called with the same arguments, results are comingled

paulmelnikow opened this issue 5 years ago · 13 comments

Hi! Thanks for this useful library 😃

I encountered an unexpected behavior. I'm using cachedmethod() on several methods in the same class. My expectation is that these could all share a cache object, and would cache their results independently of each other. If two different cached methods were called with the same arguments, I'd expect a mechanism inside the decorator to determine that these are for different methods, and therefore should not share each others' results.

What I'm seeing instead is that if one cached methods is called, and then another is called with the same arguments, that the results of the first method are returned.

class Cached(object):

    def __init__(self, cache, count=0, other_count=5):
        self.cache = cache
        self.count = count
        self.other_count = other_count

    @cachedmethod(operator.attrgetter('cache'))
    def get(self, value):
        count = self.count
        self.count += 1
        return count

    @cachedmethod(operator.attrgetter('cache'))
    def get_other(self, value):
        other_count = self.other_count
        self.other_count += 1
        return other_count

    def test_that_methods_are_cached_independently(self):
        self.assertEqual(cached.get(0), 0)
        self.assertEqual(cached.get(0), 0)
        self.assertEqual(cached.get(1), 1)
        self.assertEqual(cached.get(1), 1)
        self.assertEqual(cached.get_other(0), 5)
        self.assertEqual(cached.get_other(0), 5)
        self.assertEqual(cached.get_other(1), 6)
        self.assertEqual(cached.get_other(1), 6)
        self.assertEqual(cached.get_other(2), 7)
        self.assertEqual(cached.get(2), 2)

It seems to me that this is a bug. Is that correct?

If so I'll open a PR with the failing test case, and see if I can fix it.

Answer 1 · 2019-05-18T10:24:49.000Z

Is this functionality already covered by providing an own key function to cachedmethod. You could do something along the line of:

from cachetools.keys import hashkey
...
    @cachedmethod(cache=operator.attrgetter('cache'), key=lambda *args, **kwargs: hashkey('get', *args, **kwargs))
    def get(self, value):
        count = self.count
        self.count += 1
        return count
...

I think this is more flexible compared to doing this by default as you suggested in your PR. In case you don't want to share a cache, you would put unnecessary information for each cached item in your cache, making it much bigger than actually needed.

Answer 2 · 2019-05-18T17:10:10.000Z

Hmm, a pointer is not very large. The convenience of this working out of the box might be worth making this the default.

I may not be understanding what you're suggesting with key. Placing

@cachedmethod(cache=operator.attrgetter('cache'), key=lambda *args, **kwargs: hashkey('get', *args, **kwargs))

on a pair of methods (as in the test) does not keep the cached values separately.

Answer 3 · 2019-05-18T18:02:44.000Z

@paulmelnikow one pointer is not very large but having as many as items of them...

Regarding my suggestion, for the get method you would use:

@cachedmethod(cache=operator.attrgetter('cache'), key=lambda *args, **kwargs: hashkey('get', *args, **kwargs))

for the get_other you would use:

@cachedmethod(cache=operator.attrgetter('cache'), key=lambda *args, **kwargs: hashkey('get_other', *args, **kwargs))

So you are just injecting the name of the method manually. Got it?

A full example would be:

import operator
from cachetools.keys import hashkey
from cachetools import cachedmethod

class Cached(object):

    def __init__(self, count=0, other_count=5):
        self.cache = {}
        self.count = count
        self.other_count = other_count

    @cachedmethod(cache=operator.attrgetter('cache'), key=lambda *args, **kwargs: hashkey('get', *args, **kwargs))
    def get(self, value):
        count = self.count
        self.count += 1
        return count

    @cachedmethod(cache=operator.attrgetter('cache'), key=lambda *args, **kwargs: hashkey('get_other', *args, **kwargs))
    def get_other(self, value):
        other_count = self.other_count
        self.other_count += 1
        return other_count

obj = Cached()
assert 0 == obj.get(42)
assert 5 == obj.get_other(42)
print(obj.cache)

resulting in

{('get', 42): 0, ('get_other', 42): 5}

Answer 4 · 2019-05-18T18:21:23.000Z

I see. My use case has a class with dozens of cached methods. I wouldn't want to retype the method name for each one; that would be very error prone.

For many use cases, even a pointer per result is going to be much smaller than the cached result. In case the cache is going to contain many entries and that optimization is really needed, a key could be provided to omit the method pointer.

Alternatively, perhaps an additional keyword argument could be added to turn this behavior on or off.

Answer 5 · 2019-05-18T18:48:09.000Z

@paulmelnikow, have you considered what happens if someone wants to store the cache? Maybe as JSON or some other non-pickle format? Having pointers in your cache will be really cumbersome to work with. You could for instance use method.__name__ in your PR. This would, of course, mean even bigger caches but storing the cache would be easier. You see that it really depends on the use-case which is why I think a default implementation like you suggest is not that good since it would be hard to deactivate.

What you could be done to make things more flexible is to change the cachedmethod decorator, so that it passes the decorated method object to the key function. In this case, the default implementation of hashkey would just drop the first argument but you could use it as part of the key or call __name__ on it.

In this case, you would only need to write a key function like:

def my_hashkey(method, *args, **kwargs):
    return method, *hashkey(method, *args, **kwargs)

and pass it in your dozens of methods like:

@cachedmethod(cache=operator.attrgetter('cache'), key=my_hashkey)

If anyone wanted to have rather the name of the function in the cache for differentiation, she or he would write:

def other_hashkey(method, *args, **kwargs):
    return method.__name__, *hashkey(method, *args, **kwargs)

What do you think about that suggestion?

Answer 6 · 2019-05-18T19:40:02.000Z

That seems like a reasonable approach. It solves the problem for me.

To avoid the boilerplate, I'm already wrapping the call to @cachedmethod() in my own decorator, so it's easy to add another parameter to it.

Answer 7 · 2019-05-18T19:45:00.000Z

Cool, what does @tkem think?

Answer 8 · 2019-05-20T18:31:54.000Z

Using the method as an implicit key parameter was actually the default until cachetools v2.0. It was dropped for consistency with the plain function @cached when the key parameter was introduced, IIRC. So a cache shared by multiple methods should be handled similarly as described here. The @cachedmethod docs apparently are missing a similar example/note/reference, so this should clearly be added. Thanks!

Using the same cache for dozens of (independent?) methods I find somewhat questionable, but that depends on your specific use case, of course. Using a custom decorator wrapping @cachedmethod would be a good approach for such a case, of course.

Answer 9 · 2019-05-20T19:36:38.000Z

Thanks for elaborating.

Without going into detail on the domain, suffice it to say that the cached methods are computing different properties of a geometric object. To avoid code duplication, they are parameterized, so rather than writing the method for left_thing and right_thing, side is a parameter.

The computations are fairly slow. I don't want to compute both sides unless both sides are needed (sometimes I only run one side or the other), and I definitely don't want to compute the same side more than once.

The methods look sort of like this:

@cachedmethod()
def compute_thing(self, side):
    ...

@cachedmethod()
def compute_other_thing(self, side):
    ...

@cachedmethod()
def compute_still_another_thing(self, side):
    # Combine results of `self.compute_thing(side)` and `self.compute_other_thing(side)`

@cachedmethod()
def compute_yet_another_thing(self, side):
    # Work with results of `self.compute_other_thing(side)`

I get that there are a lot of use cases you're considering which are pretty different from this one!

Would you consider passing method to the key function as @FlorianWilhelm suggested, even if it is not used in the default hashkey implementation? That way I could wrap cachedmethod in my own function instead of writing my own decorator.

Answer 10 · 2019-05-20T19:53:44.000Z

Thanks for explaining your use case - as I said, it depends...

However, I don't think that adding your own "wrapper-decorator" is too much hassle. Don't we do that (or something simliar) every day with e.g. the tools the Python standard library gives us?

For the sake of simplicity and consistency with the @cached decorator, I'd like to keep it the way it is, but explain this more explicitly in the docs, thanks for pointing me to that omission!

BTW, have you ever thought of publishing your decorator-wrapper as an "extension" module to cachetools yourself?

Answer 11 · 2019-05-20T20:51:59.000Z

However, I don't think that adding your own "wrapper-decorator" is too much hassle. Don't we do that (or something simliar) every day with e.g. the tools the Python standard library gives us?

Hehe, it's true… Though that's is a bit unfortunate, isn't it? 🙈

What I look for in tools like this is to paper over some of the API complexities of the standard library. Ideally this is done in a way that makes the easy things easy, and the hard things possible.

BTW, have you ever thought of publishing your decorator-wrapper as an "extension" module to cachetools yourself?

I'll consider that!

And thanks, I appreciate your documenting this 😀

Answer 12 · 2019-05-20T21:08:20.000Z

It's basically a matter of "am I willing and/or able to maintain this", especially since this thing became far more popular than I thought it ever would (according to ibraries.io, at least)...

Though I understand and to some degree share your views on this topic (remember, up to v2.0 it actually behaved the way you suggested/expected), I've basically struggled to make this more consistent, more maintainable, and ultimately less effort for me since then ;-)

BTW, documentation reviews, especially by native English speakers, are always welcome, not just for d2a5473!

Answer 13 · 2019-05-20T21:39:17.000Z

Software is full of tradeoffs! 🍻

Thanks! I'll try to contribute where I can.