tikv/raft-engine

Sparse entry indices

v0y4g3r opened this issue · 3 comments

In #303 we can see some users complain about the memory usage.

The memory footprint of entry indices largely depends on the average size of entries. File purge and compaction are based on log file total size. The smaller the avg entry size is, the more entry indices will reside in memory.

In some cases, point-get of log entries is less frequent than bulk loads that reads a range of entries and applies those entries to some state machine. So we can use sparse indices to further reduce the memory footprint of raft-engine, for example, we can bookkeep the entry id to offset mapping every N entries thus the memory consumption can be reduced to 1/N.

Sparse indices means we can no longer get the exact offset of some arbitrary entry id, the overhead of reading those needless entries can be amortized in bulk load cases.

It's a good idea. I already have a PR that merge the file location field of multiple entries into one u32 (#208). We just need to move the entry offset into it as well.

The reason I didn't merge that PR is because it apparently causes regression to both memory usage and performance in some corner cases (e.g. user writes only one entry in a log batch). I prefer we can at least make it gated by a feature.

It's a good idea. I already have a PR that merge the file location field of multiple entries into one u32 (#208). We just need to move the entry offset into it as well.

The reason I didn't merge that PR is because it apparently causes regression to both memory usage and performance in some corner cases (e.g. user writes only one entry in a log batch). I prefer we can at least make it gated by a feature.

Glad to see that feature is on the road map. Maybe we can list the work needs to do and I'd be happy if I can help.

We're going to build a quorum-based log storage like Aurora's storage service. I think raft-engine should be suitable as the local storage.

@v0y4g3r Sorry for the delay. To move forward this feature, I'll first need to (1) polish #208 with feature gate, then (2) you can propose a change that fully embeds entry offset as well. Because we don't currently need this feature in TiKV, I would anticipate the review process could take a fair amount of time before we reach step 2.

I'm curious though, is this a real problem you encountered while using the engine? For the case you mentioned ("The smaller the avg entry size is, the more entry indices will reside in memory."), we can also add another memory metrics other than the existing file size metrics purge_threshold to speed up the purge. In reality it also makes more sense for user application to control the total Raft log entry count rather than total Raft log size.