An robust and extensible package to cache on disk the result of expensive calculations.
Consider an expensive function parse that takes a path and returns a parsed version:
>>> content = parse("source.txt")
It would be nice to automatically and persistently cache this result and this is where flexcache comes in.
First, we create a DiskCache object:
>>> from flexcache import DiskCacheByMTime
>>> dc = DiskCacheByMTime(cache_folder="/my/cache/folder")
and then is loaded:
>>> content, basename = dc.load("source.txt", converter=parse)
If this is the first call, as the cached result is not available, parse will be called on source.txt and the output will be saved and returned. The next time, the cached will be loaded and returned.
When the source is changed, the DiskCache detects that the cached file is older, calls parse again storing and returning the new result.
In certain cases you would rather detect that the file has changed by hashing the file. Simply use DiskCacheByHash instead of DiskCacheByMTime.
Cached files are saved using the pickle protocol, and each has a companion json file with the header content.
This idea is completely flexible, and apply not only to parser. In flexcache we say there are two types of objects: source object and converted object. The conversion function maps the former in to the latter. The cache stores the latter by looking a customizable aspect of the former.
In certain cases you would like to customize how caching and invalidation is done.
You can achieve this by subclassing the DiskCache.
>>> from flexcache import DiskCache
>>> class MyDiskCache(DiskCache):
...
... @dataclass(frozen=True)
... class MyHeader(NameByPathHeader, InvalidateByExist, BasicPythonHeader):
... pass
...
... _header_classes = {pathlib.Path: MyHeader}
Here we created a custom Header class and use it to handle pathlib.Path objects. You can even have multiple headers registered in the same class to handle different source object types.
We provide a convenient set of mixable classes to achieve almost any behavior. These are divided in three categories and you must choose at least one from every kind.
These classes store the information that will be saved along side the cached file.
- BaseHeader: source object and identifier of the converter function.
- BasicPythonHeader: source and identifier of the converter function, platform, python implementation, python version.
These classes define how the cache will decide if the cached converted object is an actual representation of the source object.
- InvalidateByExist: the cached file must exists.
- InvalidateByPathMTime: the cached file exists and is newer than the source object (which has to be pathlib.Path)
- InvalidateByMultiPathsMtime: the cached file exists and is newer than the each path in the source object (which has to be tuple[pathlib.Path])
These classes define how the name is generated. The basename for the cache file is a hash hexdigest built by feeding a collection of values determined by the Header object.
- NameByFields: all fields except the source_object.
- NameByPath: resolved path of the source object (which has to be pathlib.Path).
- NameByMultiPaths: resolved path of each path source object (which has to be tuple[pathlib.Path]), sorted in ascending order.
- NameByFileContent: the bytes content of the file referred by the source object (which has to be pathlib.Path).
- NameByHashIter: the values in the source object. (which has to be tuple[str]), sorted in ascending order
- NameByObj: the pickled version of the source object (which has to be pickable), using the highest available protocol. This also adds pickle_protocol to the header.
You can mix and match as you see it fit, and of course, you can make your own.
Finally, you can also avoid saving the header by setting the _store_header class attribute to False.
This project was started as a part of Pint, the python units package.
See AUTHORS for a list of the maintainers.
To review an ordered list of notable changes for each version of a project, see CHANGES