How to prune the deduped trie nodes?

Lines 385 to 386 in 6aa5f94

    
           // DumpNodes dumps referenced nodes committed within [baseCommitNum, thisCommitNum], into the deduped space. 
        
           func (t *Trie) DumpNodes(ctx context.Context, baseCommitNum uint32, handleLeaf func(*trie.Leaf)) error {

I can't find any codes that prune the deduped nodes? How to prune them? Or isn't it necessary?

The trie pruning includes two steps:

dump the trie nodes into the deduped-space
clean the history nodes,

thor/muxdb/internal/trie/trie.go

Line 422 in 6aa5f94

func CleanHistory(ctx context.Context, back *Backend, startCommitNum, limitCommitNum uint32) error {

The full procedure can be found here:

thor/cmd/thor/optimizer/optimizer.go

Line 222 in 6aa5f94

    
           func (p *Optimizer) pruneTries(targetChain *chain.Chain, base, target uint32) error {

The trie pruning includes two steps:

dump the trie nodes into the deduped-space

clean the history nodes,

thor/muxdb/internal/trie/trie.go

Line 422 in 6aa5f94

func CleanHistory(ctx context.Context, back *Backend, startCommitNum, limitCommitNum uint32) error {

The full procedure can be found here:

thor/cmd/thor/optimizer/optimizer.go

Line 222 in 6aa5f94

func (p *Optimizer) pruneTries(targetChain *chain.Chain, base, target uint32) error {

Yes, I had understood the 2 steps. I meant that when we dump the trie nodes into the deduped-space, the deduped-space will grow and is it necessary to prune it?

Not necessary.
Store-keys of nodes in the deduped-space are path-based, and new nodes will overwrite old ones.

Not necessary. Store-keys of nodes in the deduped-space are path-based, and new nodes will overwrite old ones.

But the deleted nodes will not been deleted properly. Or do you assume that the nodes will never be deleted?

What is deleted nodes?

What is deleted nodes?

For a trie, we may also delete a node other than updating.

Not necessary. Store-keys of nodes in the deduped-space are path-based, and new nodes will overwrite old ones.

My doubt is how to handle the old nodes that are deleted but with no new nodes overwriting them. Just ignore and let them grow?

For a trie, we may also delete a node other than updating.

trie ops never delete nodes.

My doubt is how to handle the old nodes that are deleted but with no new nodes overwriting them. Just ignore and let them grow?

95% nodes are pruned, who cases 0.00001% nodes?

Your comments in another issue #23427:

The key point of my idea is: Trie nodes with the same path are duplicated, and older ones can be safely deleted.

Till now, with blockNum prefix, we know which trie node is older, but have no idea about its position, or say path. Re-tweak the node persistent KV like this:
 path || big-endian-uint32(blockNum) || hash => RLP(node) || RLP(child hash node blockNums)
Assume we want to prune duplicated trie nodes between block number [M, N), where N satisfies "chain immutability threshold", and there is a node iterator which can filter out trie nodes by block number. The pseudo code will be:
mask := make(map[string]uint32)

// filter out trie(N)'s nodes produced after block M, and 
// construct the map from path to block number the node belongs to.
it := NewTrie(rootN).NodeIterator(bn => bn > M) 
for it.Next() {
    mask[string(it.Path())] = it.BlockNum()
}

dbIt := db.NewIterator(trieSpacePrefix)
for dbIt.Next() {
    key := dbIt.Key()
    path := key[len(key) - 32 - 4:]
    bn := uint32(key[len(path):])
    // the current node is older than N's.
    if bn < mask[path] {
        db.Delete(key)
    }
}

In the pseudo code, in fact the old dangled nodes are cleaned properly. But in the implement codes, you use a incremental copy to reduce full traversal of trie nodes. That's a wise choice. But it leads to the insufficient deletion.

95% nodes are pruned, who cases 0.00001% nodes?

I'm not challenging your implementation but just posting my doubts. If you have considered the situation and think it's not a problem, that's ok. My opinion is that if some dangled nodes can not be cleaned, it may accumulate more and more. On the other hand, EVM has a gas-refund mechanism if users delete(set zero) slots or accounts, which encourages users to free up useless space. So cleaning dangled nodes are meaningful.
Meanwhile, It is not difficult to solve the problem (may not a problem). We can iterate the trie nodes in deduped space and delete the nodes that do not belong to the trie of the deduped root. This can be done in a longer period (like every month).

This is a trade-off for the performance of deletions(deleting a contiguous range cost less disk write amp), and we may switch go-leveldb to others that support range deletion.

BTW. even all redundant nodes are deleted, disk space won't be reclaimed immediately.

deleting a contiguous range cost less disk write amp

Yes. For the history node space, we can delete a contiguous range. But for the redundant nodes in deduped space, we need iterate the trie nodes and delete the nodes that do not belong to the trie of the deduped root. This can be done in a longer period (like every month) and so it has less influence.

Yes. For the history node space, we can delete a contiguous range. But for the redundant nodes in deduped space, we need iterate the trie nodes and delete the nodes that do not belong to the trie of the deduped root. This can be done in a longer period (like every month) and so it has less influence.

It's worth it, if there's a lot of space that can be reclaimed.

	// DumpNodes dumps referenced nodes committed within [baseCommitNum, thisCommitNum], into the deduped space.
	func (t Trie) DumpNodes(ctx context.Context, baseCommitNum uint32, handleLeaf func(trie.Leaf)) error {