martinsumner/leveled

Memory utilisation for large stores

martinsumner opened this issue · 2 comments

The memory overhead for very large stores, can be very large.

The number of SST files can become very large (e.g. when using Level 6). There are two potential problems here:

  • The memory footprint of each file might be too high;
  • There is no help given to the GC (i.e. we don't hibernate rarely used files as happens in the Journal).

Proposal is to:

  • For files where we're not pre-loading into page cache, don't have. a fetch_cache as well. As the fetch_cache contains Key/MD not just Keys it might be quite large. It can also be unnecessarily filled by fold_objects or journal_compaction via the pcl_check_sequencenumber function.

Another issue is with work backlogs and L0 files. L0 files after they're written retain a large number of references to large binaries ... and will commonly be the biggest individual process in the store. This is OK, as L0 files are generally very short-lived.

However, if there is a work backlog, all SST files can only transition from delete_pending to closed if when delete confirmation is requested to the penciller there is no ongoing work (as during ongoing work the manifest is controlled by the clerk not the penciller).

In pure PUT workloads (such as in riak-admin transfers), this can lead to very high mean durations for L0 files (an in effect lots of L0 files being concurrently alive). The lifetime of a L0 file will go from 10-20s in high balanced loads, to 300-400s. This can mean there are 20-30 L0 files per db (and potentially 100s per Riak node) that are concurrently stuck in delete_pending.