aardvark-platform/aardvark.base

Plugin cache conflicts

luithefirst opened this issue · 7 comments

When looking at the aardvark plugins cache it is written to a file with AssemblyName + "_plugins.bin", while all other cache files also contain a Guid. This causes conflicts when starting different deployment versions of the same application and results in a lot of cache misses and slow startup time. It would be cleaner to also add some kind of version or assembly guid there.

Good idea, we should also make sure our cache files don't end up in the temp folder anymore (something like AppData) since linux/macos tend to empty the temp folder on reboot, etc.

An issue by adding a Guid would be that during development this would change every time when compiling and we would always have slow startup times. It looks like the current mechanism is acceptable. When it is possible to configure the cache directory (#48) and assuming that data of major application releases are separated, conflicts would be avoided at least in this case. Anyway, it would be worth reconsidering the caching architecture.

The Guid in the other cases is also only a hash of the "query" (e.g. all types with IFieldCodeable) and not to separate caches for different assembly versions. So the Aardvark caches are only valid for a single version either and when starting a different application version are overwritten.
For me it looks like this can be extended and we could include the version into the filename instead of the header of the cache file and the cache file content would only be a list of types or whatever the "query" is.

Basically, we have two different types of files in the cache directory:

  • Plugin location caches (<assembly-name>_plugins.bin)
    Contains dictionary <assembly-path> -> <last-seen-write-time>, <has-on-aardvark-init-defs>
    Determines for each DLL / exe in the current assembly path if it contains any plugins, that is definitions with [OnAardvarkInit]
  • Introspection caches (<assembly-name>-<query-guid>.txt)
    Contains cached results for the given query on the assembly
    For example, a query to find all defintions with [OnAardvarkInit]
    First line contains metadata:
    "version <version> timestamp <last-seen-write-time-of-assembly>" (Version always 1?)

@luithefirst so your suggestion is to add the assembly version for the query caches but not the plugin location caches? I don't really see how adding the version to the plugin location caches is a problem. Does simply compiling increment the assembly version in your use cases? In this case, incorporating the version into the name would also lead to cache misses for the introspection cache of your assembly.

As for the caching system in general, I wonder if this loose file-based structure is a good idea if we discriminate between versions as well. My cache directory already contains over 1000 files, can that become a problem?

Good idea, we should also make sure our cache files don't end up in the temp folder anymore (something like AppData) since linux/macos tend to empty the temp folder on reboot, etc.

Regarding Environment.SpecialFolder.ApplicationData, where does this point on other platforms? On Ubuntu it seems to point to ~/.config which should be perfectly fine.

  • Introspection Caches: I would actually use <assembly-guid>+<query-hash> (or maybe modification date instead of guid) for the cache file name instead of the assembly version as suggested in the OP. The cache would then be truly unique per assembly and we should no longer run into conflicts and misses due to overwritten caches.
    @hyazinthh Yes, there now would be caches misses when I build my app, but this would only affect very few local assemblies.

  • Plugin Caches: Yes, if only a version (that is not incremented every build) is added, it would already solve the conflicts in the majority of cases. When including a unique identifier like I just suggested for the introspection cache, we would have a very slow startup every time during development and IsPlugin would be tested for every assembly in the startup directory. This should be avoided. Alternatively, we also could:

  1. Store the IsPlugin property in a dedicated unique file similar to like the Introspection Cache. Then we would no longer need the <entry-assembly-name>_plugins.bin file. However, this would create an even larger number of files, but it would be exact.
  2. Create a database file in <entry-assembly-name>_plugins.bin that accumulates IsPlugin properties in the form of [<assembly-guid>, IsPlugin].
  3. Change the plugins cache file being unique per startup assembly path: <assembly-location+name>-hash_plugins.bin. This would cause the same kind of misses as when adding the version number, but it would allow several local copies using the same version.

Regarding the loose file-based structure: As we also typically have an even larger number of shader caches, I did not saw this as an issue so far.

65a6d1a adds a new static class CachingProperties to set the cache directory. Maybe it should be possible to set different naming schemes there as well, since there does not seem to be a single correct solution. E.g. you could set whether the version or the GUID is used for the cache names. The default could be GUID for introspection and version for plugins.

I implemented configurable naming schemes in 7cd190c. Can be modified by setting CachingProperties.PluginsCacheFileNaming and CachingProperties.IntrospectionCacheFileNaming. By default, introspection cache files are named based on the assembly file modification date and plugins cache files based on assembly version.

I guess this is the best solution for now as it does not change anything fundamentally