How to implement manual cache invalidation?
Closed this issue · 2 comments
I am writing a tool that gets data from an API that always claims the responses are uncacheable, even though the data is mostly rather static. Using the code from #173 (comment), I rewrite the headers to still cache all responses, and this provides a tremendous speedup.
As said in #173, I am using code generated from the OpenAPI spec, so I don't have easy access to the single requests but am switching between a caching and a non-caching client that is an input parameter to the generated Python functions.
Now, I'm looking into cache invalidation. The API provides a way to get a list of data IDs that have changed since a certain point in time.
My current thinking goes like this (I'm using FileStorage
):
- I read a timestamp file in hishel's cache directory.
- I record the current timestamp in hishel's cache directory.
- I query the API for changed data (using the timestamp from the first step) using the non-caching client.
- I remove the cache of the API endpoints that point to changed data (#241 will help here).
- I hit all those API endpoints again using the caching client.
- My business logic will see fresh data.
Does that sound workable, or is there an easier way even?
Can I somehow say "ignore the cached data for this API endpoint but write out new cache data"?
We have introduced force_caching
for the controller class, you can use it instead of writing a custom transport as suggested in #173 (comment).
When force_caching
is enabled, responses are invalidated only when the TTL of the storage expires. So yes, I think you can enable it and call some vacuum function that will invalidate all the stale responses and remove them using the API that will be introduced in #241.
Here is how it can look like for the filestorage
import hishel
import httpx
class AsyncVaccumFileStorage(hishel.AsyncFileStorage):
async def vaccum(self):
async with self._lock:
# vaccum stuff here
...
storage = AsyncVaccumFileStorage()
cache_transport = hishel.CacheTransport(
controller=hishel.Controller(force_cache=True),
storage=storage,
)
await storage.vaccum()
Can I somehow say "ignore the cached data for this API endpoint but write out new cache data"?
Yes, you can write a custom controller to do that, like so:
from httpcore import Request, Response
import hishel
import httpx
from hishel._serializers import Metadata
class MyStorage(hishel.FileStorage):
def store(self, key: str, response: Response, request: Request, metadata: Metadata | None = None) -> None:
print('storing response', key, response)
return super().store(key, response, request, metadata)
class IgnoreCacheController(hishel.Controller):
def construct_response_from_cache(
self, request: Request, response: Response, original_request: Request
) -> Response | Request | None:
if request.extensions.get("ignore_cache"):
return None
return super().construct_response_from_cache(request, response, original_request)
client = httpx.Client(
transport=hishel.CacheTransport(
transport=httpx.HTTPTransport(),
controller=IgnoreCacheController(),
storage=MyStorage(),
)
)
client.get("https://hishel.com")
response = client.get("https://hishel.com", extensions={"ignore_cache": True})
assert response.extensions["from_cache"] is False
Thank you very much for this detailed response. I have modernized my usage with the controller's force_cache
now and am looking forward to the new remove
method for storage.