/fca_unordered

Proof of concept of closed-addressing, O(1)-erase, standards-compliant unordered associative containers.

Primary LanguageC++

fca_unordered

Proof of concept of closed-addressing, O(1)-erase, standards-compliant unordered associative containers.

template<
  typename T,typename Hash=boost::hash<T>,typename Pred=std::equal_to<T>,
  typename Allocator=std::allocator<T>,
  typename SizePolicy=prime_size,typename BucketArrayPolicy=grouped_buckets
>
class fca_unordered_set;

template<
  typename Key,typename Value,
  typename Hash=boost::hash<Key>,typename Pred=std::equal_to<Key>,
  typename Allocator=std::allocator</* equivalent to std::pair<const Key,Value> */>,
  typename SizePolicy=prime_size,typename BucketArrayPolicy=grouped_buckets
>
class fca_unordered_map;

SizePolicy

Specifies how the bucket array grows and the algorithm used for determining the position of an element with hash value h in the array.

  • prime_size: Sizes are prime numbers in an approximately doubling sequence. position(h) = h % size, modulo operations are sped up by keeping a function pointer table with fpt[i](h) == h % size(i), where i ranges over the size sequence.
  • prime_switch_size: Same as before, but instead of a table a switch statement over i is used.
  • prime_fmod_size: position(h) = fastmod(high32bits(h) + low32bits(h), size). fastmod is a fast implementation of modulo for 32-bit numbers by Daniel Lemire.
  • prime_frng_size: position(h) = fastrange(h, size). Daniel Lemire's fastrange maps a uniform distribution of values in the range [0, size). This policy is ill behaved for low quality hash functions because it ignores the low bits of the hash value.
  • prime_frng_fib_size: Fixes pathological situations of prime_frng_size by doing positon(h) = fastrange(h * F, size), where F is the Fibonacci hashing constant.
  • pow2_size: Sizes are consecutive powers of two. position(h) returns the higher bits of the hash value, which, as it happens with prime_frng_size, works poorly for low quality hash functions.
  • pow2_fib_size: h is Fibonacci hashed before calculating the position.

BucketArrayPolicy

  • simple_buckets: The bucket array is a plain vector of node pointers without additional metadata. The resulting container deviates from the C++ standard requirements for unordered associative containers in two aspects:
    • Iterator increment is not constant but gets slower as the number of empty buckets grow; see N2023 for details.
    • Because of the former, erase(iterator) returns void instead of an iterator to the next element.
  • grouped_buckets: The resulting container is fully standards-compliant, including constant iteration and average constant erasure(iterator). Besides the usual bucket array, a vector of bucket group metadata is kept. Buckets are logically grouped in 32/64 (depending on the architecture) consecutive buckets: the associated bucket group metadata consists of a pointer to the first bucket of the group, a bitmask signalling which buckets are occupied, and prev and next pointers to link non-empty bucket groups in a bidirectional list. Going from a given bucket to the next occupied one is implemented as follows:
    • Use bit counting operations to determine, in constant time, the following occupied bucket within the group.
    • If there are no further occupied buckets in the group, go the next group (which is guaranteed to be non-empty).
    The memory overhead added by bucket groups is 4 bits per bucket.
template<
  typename T,typename Hash=boost::hash<T>,typename Pred=std::equal_to<T>,
  typename Allocator=std::allocator<T>
>
class fca_simple_unordered_set;

template<
  typename Key,typename Value,
  typename Hash=boost::hash<Key>,typename Pred=std::equal_to<Key>,
  typename Allocator=std::allocator</* equivalent to std::pair<const Key,Value> */>
>
class fca_simple_unordered_map;

Abandoned experiment where individual occupied buckets where linked in a bidirectional list. Outperformed by fca_unordered_[set|map] with grouped_buckets.