Disk-based http caching for LLM apps and beyond.
Motleycache caches LLM and tool calls, as well as other web requests. It works with all requests made using most popular Python HTTP clients: requests, HTTPX, and Curl CFFI.
pip install motleycache
from motleycache import enable_cache, disable_cache
enable_cache()
# Now all requests made inside your code or any libraries you use will use cache if available
# To turn caching off, call this:
disable_cache()
If you want to only cache certain URLs, you can use a whitelist:
from motleycache import set_cache_whitelist
set_cache_whitelist(["*://api.openai.com/*"])
# Now only requests to OpenAI API will be cached
Alternatively, you can specify a blacklist. Note that these two options are mutually exclusive.
from motleycache import set_cache_blacklist
set_cache_blacklist(["*://api.openai.com/*"])
# All requests except those made to OpenAI API will be cached
By default, the cache is stored in the default user cache location, which depends on the platform.
It's possible to set the cache location manually:
from motleycache import set_cache_location
set_cache_location("/path/to/cache/dir") # relative or absolute
There is an option to block all non-cached requests. It is particularly useful for testing your applications.
Motleycache will raise a StrongCacheException
in case of a cache miss:
from motleycache import set_strong_cache
set_strong_cache(True) # turn on
set_strong_cache(False) # turn off
If you want to only use existing cache, you can disable updating the cache with new requests:
from motleycache import set_update_cache_if_exists
set_update_cache_if_exists(False) # disable
set_update_cache_if_exists(True) # enable
Motleycache was first designed to be a part of our multi-agent framework motleycrew (check it out!), but we soon decided to make it a separate lightweight library instead. Because of that, motleycache also provides integration with Lunary observability platform, so that cached calls are indicated in your traces.
To find out more, see our Caching and observability docs page.
When enable_cache
is called, we decorate the request functions of supported clients with our caching wrapper. When a request is made to a cacheable URL, a hash of all its parameters is computed, the corresponding cache file (containing pickled response) is retrieved if it exists, and its unpickled contents are returned to the caller.
In case of a cache miss, the request is passed on to the original client function (unless strong caching is enabled), and the response is then added to the cache (unless cache updating is turned off).
If you use some other HTTP client and want us to support it, please raise an issue in this repo. Contributions are also welcome!