Perf: Concurrent MARF usage by the Clarity VM
Opened this issue · 4 comments
Each Clarity VM instance operates on independent views of the chainstate: the VM instance only makes changes to the “next” MARF tree, which is itself not written to (at all) until the Clarity VM instance “commits”. However, the implementation of the Clarity VM doesn’t take much advantage of this: it opens write transactions immediately even if the VM instance is never going to commit (in the case of RPC calls, various invocations through the codebase like stackerdb configuration). This forces the stacks-node
into a lot of serialization even if the stacks-node
has capacity for concurrency (e.g., handling RPC calls at the same time its handling block processing). This is particularly costly for signers – they rely on lots of RPC calls, but their node must also stay as up to date as possible.
The fix here I think is some sensitive refactoring of the way the Clarity VM instantiates its transactions. Perhaps adding a “defer_tx” flag to the clarity vm instantiation will be necessary so that when it opens its transaction, it doesn’t immediately try to obtain a write tx. Then, certain kinds of writes (possibly metadata writes?) will need to be deferred to the commit (this I think is already done through the KeyValueWrapper
used to unroll public function calls, etc. on error). The efficacy of this approach could be tested by initiating and holding a write transaction in one thread while making sure that another thread can execute the Clarity VM on the same chainstate.
Note: pushing this even further, both the signer’s proposal evaluation and the miner’s block assembly do commit their Clarity instances, but do it to state that is never looked at again. In theory, it should be possible to make both of these (most importantly, probably the signer’s proposal evaluation) also “read-only”.
In the case of, for example, signer proposal evaluation - couldn't concurrency mess this up? For example, we're not going to commit any of the state changes in that block, but if state changes happened during the proposal eval (from elsewhere), it could invalidate the evaluation (even though the eval will return "ok")?
Or, and this just shows my lack of knowledge on how this works, is the proposal eval building off of a "leaf node", and so is some other process (like committing a new block), but the other process doesn't mutate the leaf node?
Separately, do we have any traces / benchmarks that indicate how long a commit lock typically lasts, like for appending a block? I'm guessing it's significant (or at least is sometimes)
In the case of, for example, signer proposal evaluation - couldn't concurrency mess this up? For example, we're not going to commit any of the state changes in that block, but if state changes happened during the proposal eval (from elsewhere), it could invalidate the evaluation (even though the eval will return "ok")?
No, because the MARF is append-only. A concurrent commit can only write new trie data at the end of the MARF's on-disk representation, outside of where a read-only connection would be reading.
is the proposal eval building off of a "leaf node", and so is some other process (like committing a new block), but the other process doesn't mutate the leaf node?
At a high level, block processing works as follows:
- Open a transaction on the MARF's sqlite DB
- Open the
.blobs
file for the MARF, which contains the concatenation of all tries ever written. Seek to the offset into the.blobs
file where the root of the parent block's trie can be found. - Compute the block's new trie by evaluating each transaction in order. Trie reads occur relative to the parent block's trie and can only access leaves in the trie or any of the trie's ancestors (but never its children's leaves).
- Buffer all new data into a trie in RAM
- "Seal" the trie -- compute the intermediate and root hashes
- Commit the trie -- serialize it to bytes and append it to the MARF's trie file (the
.blobs
file) - Commit the trie metadata -- this includes the root hash of the trie, the root hash of the parent, and the offset into the
.blobs
file where the trie data can be found.
Block proposals only do steps 1-4. They simply drop the RAM-based trie at the end of the block evaluation. Because block-processing only appends new trie data, it'll never overwrite data that the block-proposal logic is reading.
Will need a genesis sync to test