tatami-inc/tatami_hdf5

Allow specification of the cache types in the HDF5 readers.

LTLA opened this issue · 2 comments

LTLA commented

Additionally, add helper functions that inspect the type of the HDF5 dataset and chooses a matching cache type.

LTLA commented

This can be done below the class template level, by reading the data into a void* pointer that is appropriately aligned, and then casting the pointer to the cache type before copying its contents to the output buffer. Might be a little inconvenient, though, as now we need to template the interpretation of the pointer, and pay for the cost of choosing the pointer type at each fetch() invocation.

LTLA commented

Well, in the end, I just templated on the cache types, and to hell with it. (d8d5b64)

Basically, we see a tension between speed, memory usage and binary size, and each implementation can only pick two.

  • The current situation (cache type = interface type) gives us speed, as the types are all known at compile time; and a small binary size, as there aren't any additional template instantiations. However, this comes at the cost of memory usage in the cache, or more specifically, inefficient caching because fewer chunks can fit inside there.
  • Templating on the cache types retains the speed and improves memory usage. However, if we want to achieve maximum memory optimization, we need to instantiate the template for each possible cache type, resulting in a large binary. For sparse matrices, this is squared because we need to consider all combinations of cache types for values indices.
  • Dynamically switching between types with a void* pointer provides the best memory usage without any extra template instantiations. However, this is slower as every vector access needs to do an extra check on the type, e.g., with tatami::SomeNumericArray. Or if we cast the pointer itself, we still need to do some templating to use that pointer. Also, the implementation is a nightmare as we need to make sure we get correct alignment for the larger types.

All things considered, templating on the types is the best compromise. It's fast and simple, yet it still allows users to decide whether they want to bother with fine-tuning cache performance at the expense of their binary size.