martinsumner/leveled

Reduce open files

martinsumner opened this issue · 5 comments

Investigate performance impact of reducing open files/process limits by having larger files at L4 and below.

To do this the leveled_ebloom needs to more efficiently support blooms of 32K - 64K objects

The branch changes the shape of the LSM tree (reduced size of levels below L3 to reflect the fact that L4 files are larger). However, what is the impact of making this transition - if you had384 files in L4, and the new limit is 256. There will immediately be a very large backlog.

Is the backlog harmful? Especially as multiple vnodes will concurrently have the same backlog on the same node.

How could this transition be smoothed?

The branch has been altered, and now the LSM shape is the same, but the big files start at Layer 3 (not 4).

Interestingly, on the standard 24-hour Riak performance test, there is the same throughput on both versions of the branch, and this is the same as the control. So evidence indicates we can make this change without losing performance.

Experimented with making the basement level double size. Performance was OK, however, this meant that when a new level required there was a rapid increase in the use of open files. As growth can be managed by altering limits, predictable growth is better even if this leads to more open files.

The following screenshot shows 2 x 24-hour tests, the first 24 hour tests shows the growth in files with vanilla Riak 3.0.7 (and no file size doubling). The second 24 hour test double the size of basement files at where the basement is at least L3. the second test handles the same volume but with reduced number of file handles - but has a period of sudden growth that is disproportionate to the from growth in data, and so is operational risky.

FileCountByLevel

Chart of ledger file counts by Level (per node) in 24 hour volume tests.

Doubling just the basement size is therefore not to be pursued as an option.

This is the 3-way comparison, including a test where the file size was simply doubled at all levels L3 and below:

FileCountByLevel_3way

The throughput and response times in all 3 tests where equivalent (the double at L3 and below test was fractionally better at 0.4% - but within expected margin for error).