basho/bitcask

Another way for fstats to leak

Opened this issue · 0 comments

thanks to @joecaswell for finding this.

This leak is slightly hard to trigger, but certain pathological behavior patterns might trigger it.

Merge creates files, and thus creates fstats entries. These are not synchronized with the write thread in any way, but once a value within them is read, they're added to the write thread's read_files state, from where it is trimmed. So, to reproduce this, you need to have a file that is created by merge, never read from, and then merged away again.

The scenario where we saw this in the wild seemed to be:

  1. a few very hot keys in a partition
  2. a remainder of other values in file smaller than the small files threshold.
  3. since we have hot keys, the main writing file gets read as extremely fragmented, so when needs_merge comes around, both files are selected for the partial merge.
  4. a new merge file is created, containing the data from the small file, thus creating another small file.
  5. the cycle repeats, as nothing is ever added to the small file, other than the occasional value, since the hot keys represent most writes to the partition.