/Pfm

Performant, persistent and concurrent dictionary for C# / .NET

Primary LanguageC#MIT LicenseMIT

Pfm - Persistence for the masses

Immutable collections are "all the rage" these days, for good reasons. Their story in .NET, however, is very fragmented:

However, there are no mutable collections with cheap "copy on write" semantics.

Library design

Experiments with abstract statics in interfaces to implement traits-like design as commonly seen in C++. The technique is applied to model binary search trees with reduced overheads (see benchmarks below).

The solution consists of three assemblies:

  • Pfm.Collections contains various data structure implementations
  • Pfm.Test has (somewhat messy) correctness tests
  • Pfm.Benchmark has benchmarks.

This project has been inspired by "Persistence for the Masses" paper, hence also the name.

Collections

Collection namespaces are divided by data structure and implementation technique:

  • Trie.DenseTrie implements persistent vector supporting direct access and one-sided push/pop operations (akin to Clojure vectors)
  • TreeSet implements persistent joinable balanced trees, in AVL and weight-balanced variants. In addition to usual sorted set operations, it also provides iteratos for forward and backward navigation, fast (logarithmic) access to n'th element in sorted order, and user-defined monoidal "augmentations" that can be used to implement, for example, an interval tree.

Tests include cases that cover correctness of copy on write semantics. Benchmarks attempt to cover the cost of COW semantics.

Future work

I have for a long time been annoyed by the fact that the standard Dictionary<K,V> does not support set operations (intersection, difference, union), even though it is an ISet<K>. TreeSet is a building block that can be used to implement a MergeableDictionary<K,V> that would support set operations with configurable merging of equivalent values. Efficient bulk-build from a sorted sequence would also be possible to implement.

Sparse trie - Bagwell's HAMT.

Benchmarks

Method:

  • Build the solution in Release mode
  • Run Pfm.Benchmark.exe from a console elevated to admin (needed to collect HW perf counters)

Environment:

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22621.963) 12th Gen Intel Core i7-12700, 1 CPU, 20 logical and 12 physical cores .NET SDK=7.0.100 [Host] : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2 DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2

SortedSet is used as a reference implementation. The underlying implementation is a RB-tree which does less work for insertions and deletions, but results in deeper trees.

SequencePatternBenchmark

This benchmark tries to find the "worst" sequence of insertions and deletions for a given tree implementation. Results (not shown) indicate that random/random is among the top 5 of 49 combinations, so it is used throughout the other benchmarks.

The following benchmarks use a random sequence of 8197 elements.

TreeModifyBenchmark

This benchmark inserts a random sequence into the tree, then removes the elements in reverse order of insertion.

Method Mean Error StdDev BranchInstructions/Op InstructionRetired/Op CacheMisses/Op Gen0 Gen1 Allocated
AvlTreeSet 2.806 ms 0.0203 ms 0.0190 ms 5,132,036 26,197,461 5,660 27.3438 7.8125 384.29 KB
WBTreeSet 2.389 ms 0.0068 ms 0.0060 ms 4,860,134 27,351,823 5,278 27.3438 7.8125 384.29 KB
AvlCOWSet 2.949 ms 0.0085 ms 0.0079 ms 5,216,734 26,441,406 8,294 62.5000 39.0625 833.37 KB
WBCOWSet 2.584 ms 0.0139 ms 0.0123 ms 4,965,466 27,795,312 8,067 62.5000 31.2500 827.98 KB
SortedSet 2.206 ms 0.0084 ms 0.0078 ms 3,708,473 11,000,260 4,551 23.4375 7.8125 320.24 KB
ImmutableSet 6.636 ms 0.0581 ms 0.0485 ms 12,854,511 54,083,333 43,002 757.8125 460.9375 9713.33 KB

"COW" benchmarks attempt to show the cost of COW semantics where the whole tree is rebuilt twice. These benchmarks proceed as follows:

  • First, only even numbers from the sequence are inserted into the tree.
  • Then, a COW copy of the tree is made and all odd numbers are inserted into the copy.
  • Then, another COW copy is made and all elements are removed in reverse order of insertion.

TreeFindBenchmark

This benchmark inserts a random sequence into the tree, then searches for each inserted element in increasing order (0, 1, ..., max).

Method Mean Error StdDev CacheMisses/Op BranchInstructions/Op InstructionRetired/Op
AvlTreeSet 298.3 us 0.87 us 0.81 us 87 375,178 1,208,659
WBTreeSet 300.1 us 0.75 us 0.66 us 79 383,311 1,229,199
SortedSet 474.3 us 2.20 us 2.06 us 128 1,050,059 2,901,164
ImmutableSet 465.4 us 1.92 us 1.70 us 103 1,169,875 2,992,643

Join tree has even better lookup performance than standard SortedSet and ImmutableSet.

VectorModifyBenchmark

This benchmark creates a vector of 16384 elements and adds 1 to each element.

Method Mean Error StdDev BranchInstructions/Op InstructionRetired/Op CacheMisses/Op
List 13.12 us 0.028 us 0.027 us 49,407 280,485 2
ImmutableList 5,138.95 us 20.923 us 19.571 us 9,529,344 43,057,812 57,847
DenseTrie 392.71 us 0.823 us 0.729 us 1,332,262 6,028,385 54

As expected, the built-in mutable list has the best performance. Still, COW DenseTrie has significantly better performance than the built-in immutable list.

Remarks

TreeSet has significantly better lookup performance than SortedSet. I ascribe this to two factors:

  • Sorted set uses internally a red-black tree, which is on average deeper than WB or AVL trees.
  • Inspection of the generated assembly shows that JIT is able to inline key comparison method when implemented by a abstract static interface method. This is not the case for SortedSet where the comparison method is a delegate.

I have attempted to convert recursive tree algorithms to iterative algorithms, using TreeIterator as a manually maintained stack. Surprisingly, the result was slower due to frequent calls to CORINFO_HELP_ASSIGN_REF, which doesn't happen when the reference is pushed onto the stack during recursion. See dotnet/runtime#59031 This is also the reason for using ulong for the node's transient tag instead of object (as is suggested in the papers).