Design: Consistent mesh for page-level mutation logs
losfair opened this issue · 3 comments
Currently the cache invalidation logic depends on FoundationDB. After each call to /batch/commit
, we store the set of pages mutated by the transaction into a special keyspace in FDB, keyed by the commit versionstamp. When a later request to /stat
comes in, keys in the range (client_last_known_commit_version, current_read_version)
are scanned and returned to the client so that they can invalidate their caches.
This results in a lot of temporary data that is written once and read a few times. It is a waste to let these data go through the entire FDB transaction & storage systems, and a better design is possible.
The design is to form a mesh network of all mvstore instances so that they can synchronize cache invalidation logs directly and keep everything in-memory. Metadata is still backed by FDB to keep things consistent.
Do you need any help with this?
@fire I haven't been able to get back to this yet... The idea I have in mind is that, we select one mvstore process in the cluster to be the "write proxy" and let all mutations go through the proxy. The proxy keeps an interval of mutated page numbers in memory (10min?). Readers can then query the proxy and invalidate their caches accordingly. In case the write proxy is not able to provide this information (for example if a new proxy is elected), the reader invalidates its entire cache.
Election of the write proxy happens through FDB. The FDB read version can be used to fence between write proxy generations. How to scale the role is still an open question though (maybe shard it like what FDB does for itself?)