No way to update or invalidate cache

Question

No way to update or invalidate cache

Closed this issue 4 years ago · 9 comments

I'm using Haxl to talk to a REST API, and for the sake of example, we can pretend like there are two endpoints. One endpoint lets me fetch a Message given an ID, and another endpoint lets me fetch, say, n messages that were sent before a given ID.

data FooRequest a where
    GetMessage :: Id -> FooRequest Message
    GetMessagesBefore :: -> Id -> Int -> FooRequest [Message]

Whenever I perform a GetMessagesBefore request, I would like to be able to cache the results of this request such that performing a GetMessage request with an id fetched previously via GetMessagesBefore hits the cache.

I can do something like:

-- let's pretend we have `env`
getMessagesBefore id_ count = runHaxl env $ do
    res <- dataFetch (GetMessagesBefore id_ count)
    for res $ \message -> cacheRequest (GetMessage (messageId message)) (Right message)

However, cacheRequest throws an exception if the request has already been cached.

Is there some fundamental reason why we shouldn't be updating the cache? I can see nothing regarding cache invalidation anywhere in the Haxl codebase. I think I can probably whip something up to be able to update the cache, but I'd rather not if it's going to break any invariants.
Having two different requests that fetch the same things where one fetches a single thing and the other multiple things seems like a pretty common occurrence, and having such requests intelligently share cache would be a cool thing to have in the library. Are there any plans to implement something like this?

Thanks for your time!

Answer 1 · 2018-06-02T21:42:37.000Z

I have written the following:

updateCache
    :: (Show a, Eq (req a), Hashable (req a), Show (req a), Typeable (req a))
    => req a
    -> a
    -> GenHaxl u ()
updateCache request result = GenHaxl $ \env -> do
    cache <- readIORef (cacheRef env)
    case DC.lookup request cache of
        Nothing -> do
            ivar <- newFullIVar $ Ok result
            writeIORef (cacheRef env) $! DC.insert request ivar cache
            done (Ok ())
        Just iv@(IVar cr) -> do
            e <- readIORef cr
            case e of
                IVarFull (ThrowIO ex) ->
                    return $ Throw ex
                IVarFull (ThrowHaxl ex) ->
                    return $ Throw ex
                IVarFull (Ok _) -> do
                    writeIORef cr $! IVarFull (Ok result)
                    done (Ok ())
                IVarEmpty JobNil -> do
                    writeIORef cr $! IVarFull (Ok result)
                    done (Ok ())
                IVarEmpty _ ->
                    return $ Blocked iv (Cont $ updateCache request result)

However, while I think I understand the basics of haxl's architecture, I can't really determine whether it's correct or if it'll break everything.

EDIT:

This does indeed break everything. Starting an async fetch and then modifying the cache this way crashes because it's not possible to add a job to fetch something that has already been fetched/is in cache.

I've made some other attempts but so far I've been unsuccessful.

Answer 2 · 2018-06-04T06:53:48.000Z

Having cache without ability to update, modify, clear or invalidate it sounds like a serious problem.

Answer 3 · 2018-06-04T07:46:42.000Z

@saevarb this is a completely reasonable thing to want to do. Indeed, I've wanted to do similar things myself on occasion, but never got around to implementing it.

To answer your specific points:

We don't allow the cache to be modified, because we would lose the property that evaluating a given computation twice yields the same result, which is what we rely on for memo and the other memoization primitives to be valid. However, it's reasonable to have an operation like cacheRequest which adds an entry to the cache if it isn't already there - this will require an additional Eq constraint to compare results.
Yes absolutely. I think it would be great to have better support for this. Adding entries to the cache manually (as in your example) is one way, but could be inefficient in general. What we probably need is a more general way to look for a cached result, but I haven't tried to do this, it's not clear to me yet what the right abstractions are. It would definitely be interesting to explore this.

Answer 4 · 2018-07-03T03:12:00.000Z

@simonmar wouldn't the cache invalidation itself require an additional function to evaluate to see if / when the proper action can take place? As part of your traversal you can take in something like a time to represent the state which can invalidate. A new time means new data and will not violate RT.

Answer 5 · 2018-07-04T09:22:04.000Z

@somethingconcon to avoid confusion, could you give the type(s) of the operations you're suggesting?

Answer 6 · 2018-07-06T14:42:52.000Z

@simonmar Sorry, I'm not 100% sure what I should provide as an example. Could you explain what your expectation would be so I can provide a proper example?

Answer 7 · 2018-07-06T14:48:21.000Z

@somethingconcon Well, I didn't fully understand your comment, so I think making it concrete with some actual code would help. e.g. I'm not sure what the "additional function" is, not sure what the traversal you mention is, not sure what would need to take a time or why. Basically could you explain in more detail please :)

Answer 8 · 2018-07-06T14:52:51.000Z

Oh, yes. My apologies on not being clear. For the sake of brevity, it looks like I cut out all important detail. I will try to provide a code example after the workday. Thanks! :)

Answer 9 · 2018-10-10T21:09:59.000Z

@saevarb prepareMemo and runMemo sound like they would help with one and possibly two.

If you’d don’t need that granularity, then cachedComputation is a service function that will handle such details behind the implementation.

The items in a cached computation aren’t stored across rounds however, but this would mean fetches in the same round for colliding signatures would be handled as you’re going for.

uncachedRequest is the machinery for ‘updating’ as far as I can tell.

The signature for the request will be what determines which slot in the result map your value is assigned to, so depending on what parameters you use for your signature definitions in your Hashable instance it should transparently update the value for subsequent fetches.