facebook/rocksdb

Feature: realtime Secondary TryCatchUpWithPrimary

rockeet opened this issue · 0 comments

We have written a fuse file system which intentional blocks EOF read on writing files. Thus when primary instance writing WAL & manifest files, EOF read on secondary instance will be blocked, once primary instance write(or close) WAL & manifest, blocking read on secondary will get returned with the new written data at once -- our bench shown the latency is ~100us on commodity ethernet(with NFS + O_DIRECT).

With this feature, secondary instance need not sleep between TryCatchUpWithPrimary loops, but there are issues for current implementation:

  1. long lock(mutex_) in TryCatchUpWithPrimary
  2. catch up WAL depends on catch up manifest

If the long lock on mutex_ can be replace with short lock, and remove dependency then catch up WAL & manifest in 2 different threads, it should be a perfect solution.

I'm not familiar with the internal complexity of secondary instance and can not contribute a PR, I want a solution for this feature.

Another question: PurgeObsoleteFiles may not needed in TryCatchUpWithPrimary, because files are shared with primary, it should be purged by primary(with a retention period).