/AdvancedDataStructures

A collection of exotic & advanced data structures in C++ and Haskell.

Primary LanguageC++

AdvancedDataStructures

A collection of exotic & advanced data structures in C++ and Haskell.

  • Binary heap (static): A modification to the classic data structure, supporting decreaseKey and optimized specifically for Prim's and Dijkstra's algorithms: the total number of values is predetermined and no values are added/removed after initialization. Uses two "pointers" per node. Time complexities are O(lgn) for extractMin and decreaseKey and O(n) for construction/initialization (a trivial operation).
  • Binomial heap (meta): A variant of the classic heap data structure, supporting insertions in O(1) amortized time and O(lgn) for all other operations. The heaps & supporting structures carry their size in their type, which makes it impossible (!) to construct an invalid binomial heap or binomial tree - it is verified at compile time. It also serves as a correctness proof for the heap operations, as well as their helper functions.
  • d-ary heap: a generalisation of the binary heap, where every node has d children instead of 2. While having basically the same structure and idea as the binary heap, it has better cache behaviour and performs decreaseKey quicker, making it a slightly better choice for most purposes. The 4-heap, specifically, stands out as a good performer. The aforementioned decreaseKey is still messy to implement, though, which is why it's missing (for now). All heap modification operations now run in O(logdn) time in the worst case.
  • FixedEytzingerMap: a static cache-friendly STL-compatible associative container that packs all keys in an array, similarly to a flat_map. Instead of a sorted order for lookup via binary search, it uses the Eytzinger layout to drastically reduce cache misses for faster lookup. Insertion/deletion is slow & cumbersome, which is why the container is static - only the values can be modified, the keys not. Adapted from Michael Kazakov's implementation.
  • PairingHeap: a rather simple, fast and efficient multi-way heap, regarded as a simplified Fibonacci heap. Theoretically slower, but in practice much simpler and therefore more efficient than Fibonacci & Binomial heaps, this is most often the best choice for classical algorithms such as Prim's and Dijkstra's. Has a user-friendly interface with decreaseKey functionality. Time complexities for insert, extractMin and decreaseKey are O(1), O(lgn) and O(lgn) amortized, respectively. The pairing_heap_static is a variant of this structure, optimized precisely for use in the aforementioned algorithms (similarly to binary_heap_static): the total number of values is predetermined, memory is allocated once for all nodes and also deallocated once (instead of per-node), all nodes are contained inside this continuous space and no values are added/removed after initialization. See also.
  • PersistentVector: a persistent tree with a large branching factor (usually 32) that supports random access, push_back & pop_back in O(log32n) time (practically O(1)) via path-copying. All operations & internal structure sharing is thread-safe and memory-safe. Includes optimizations like transient (modifying) operations on rvalues, and keeping the last leaf node (the one most operations affect) outside of the tree structure. This means that 97% of the time push_back and pop_back will be done in real O(1) time with at most 1 extra allocation.
  • RangeMinimumQuery: an asymptotically optimal and practical solution to the problem of preprocessing a static array in order to efficiently query for the minimum of any subarray (or its position). Preprocessing is done in O(n) time and O(n) extra memory via a reduction RMQ->LCA->RMQ01, and the queries are then answered in O(1) - with all constants being very low. Uses the tagged integer utilites from this repo. Tests included.
  • SkewHeap: a self-adjusting, leftist heap where every operation is done via skew heap merging. Although there are no structural constraints, it has been shown that all heap modification operations work in O(lgn) amortized time. The C++ version is optimized to use a preallocated buffer for node storage, has reduced memory usage & only supports inserting values and replacing the minimum with a bigger value.
  • Splay tree: a self-adjusting binary search tree, in which every insertion or lookup of a value rotates it up (splays it) to the tree's root. This way recently accessed elements are quick to access again, making the structure suitable for some kind of cache. Due to careful rotations during splaying, the tree is kept relatively balanced, giving O(lgn) amortized time for insertion, lookup and removal.
  • Static hashset: hashing a static set of values in O(n) storage with no collisions. The values hashed are required to be in [0;p) for a prime number p (can be chosen according to the use case). The total space used is for approx. 5n elements. The only meaningful operation supported is querying whether or not a value is in the set and is performed in, of course, O(1) time.
  • Suffix array: an array of all the suffixes of a given string, in sorted order. Five algorithms are presented: the naive sorting (O(n2lgn)), Manber-Myers's O(nlgn) prefix doubling algorithm, the linear recursive DC3/Skew algorithm by Kärkkäinen & Sanders, and two implementations of the Induced-Sorting linear algorithm by Ge Nong, Sen Zhang & Wai Hong Chan.
  • Treap: tree + heap = treap. A binary search tree where every node has a randomly generated "priority". The values in the nodes form a binary search tree, while the priorities in the nodes form a binary heap at the same time. Relies on rotations for moving values (with their respective priorities) up and down the structure, while preserving the BST invariant. Therefore the tree is roughly well balanced and all modifying operations have O(lgn) expected time complexity. Suitable for general-purpose use, such as sets, maps, multisets etc.
  • Trie: to-do
  • X-fast trie: a static data structure, supporting searching for a value's predecessor/successor in a given, fixed set of values in just O(lglgm) time (where m is the maximum value in the set). Constructed in O(nlgm) time and takes O(n) space. The name comes from the trie, constructed during initialization. Due to being static (i.e. no addition/removal of values after construction) the "pointers" from each node in the trie are replaced by indices, used during the search operations, and as a result the finished structure does not actually contain a trie. In order to hit O(lglgm) as a hard, non-amortized bound, it relies on being able to hash perfectly a given, fixed set of integers (currently not supported).

To-do:

  • fix existing structures
  • add more structures, such as: Treap and SkewHeap in C++, Persistent Vector, Tiered Vector, (Indexable) SkipList, dense hashset, perfect hashset, perfect dynamic hashset, Binomial Heap, Splay trees, B-trees, Tries (string, Radix & Patricia and made persistent), Rope, Suffix Tree, Suffix Array, k-d tree, Fibonacci heap and many more...