Proof of concept of closed-addressing, O(1)-erase, standards-compliant unordered associative containers.
- Development Plan for Boost.Unordered
- Benchmark results for this PoC
template<
typename T,typename Hash=boost::hash<T>,typename Pred=std::equal_to<T>,
typename Allocator=std::allocator<T>,
typename SizePolicy=prime_size,typename BucketArrayPolicy=grouped_buckets
>
class fca_unordered_set;
template<
typename Key,typename Value,
typename Hash=boost::hash<Key>,typename Pred=std::equal_to<Key>,
typename Allocator=std::allocator</* equivalent to std::pair<const Key,Value> */>,
typename SizePolicy=prime_size,typename BucketArrayPolicy=grouped_buckets
>
class fca_unordered_map;
SizePolicy
Specifies how the bucket array grows and the algorithm used for determining the position
of an element with hash value h
in the array.
prime_size
: Sizes are prime numbers in an approximately doubling sequence.position(h) = h % size
, modulo operations are sped up by keeping a function pointer table withfpt[i](h) == h % size(i)
, wherei
ranges over the size sequence.prime_switch_size
: Same as before, but instead of a table aswitch
statement overi
is used.prime_fmod_size
:position(h) = fastmod(high32bits(h) + low32bits(h), size)
. fastmod is a fast implementation of modulo for 32-bit numbers by Daniel Lemire.prime_frng_size
:position(h) = fastrange(h, size)
. Daniel Lemire's fastrange maps a uniform distribution of values in the range[0, size)
. This policy is ill behaved for low quality hash functions because it ignores the low bits of the hash value.prime_frng_fib_size
: Fixes pathological situations ofprime_frng_size
by doingpositon(h) = fastrange(h * F, size)
, whereF
is the Fibonacci hashing constant.pow2_size
: Sizes are consecutive powers of two.position(h)
returns the higher bits of the hash value, which, as it happens withprime_frng_size
, works poorly for low quality hash functions.pow2_fib_size
:h
is Fibonacci hashed before calculating the position.
BucketArrayPolicy
simple_buckets
: The bucket array is a plain vector of node pointers without additional metadata. The resulting container deviates from the C++ standard requirements for unordered associative containers in two aspects:- Iterator increment is not constant but gets slower as the number of empty buckets grow; see N2023 for details.
- Because of the former,
erase(iterator)
returnsvoid
instead of an iterator to the next element.
- This policy is used to simulate
unordered_bucket_map
as specified in the
Development Plan for Boost.Unordered.
grouped_buckets
: The resulting container is fully standards-compliant, including constant iteration and average constanterasure(iterator)
. Besides the usual bucket array, a vector of bucket group metadata is kept. Buckets are logically grouped in 32/64 (depending on the architecture) consecutive buckets: the associated bucket group metadata consists of a pointer to the first bucket of the group, a bitmask signalling which buckets are occupied, andprev
andnext
pointers to link non-empty bucket groups in a bidirectional list. Going from a given bucket to the next occupied one is implemented as follows:- Use bit counting operations to determine, in constant time, the following occupied bucket within the group.
- If there are no further occupied buckets in the group, go the
next
group (which is guaranteed to be non-empty).
- The memory overhead added by bucket groups is 4 bits per bucket.
template<
typename T,typename Hash=boost::hash<T>,typename Pred=std::equal_to<T>,
typename Allocator=std::allocator<T>
>
class fca_simple_unordered_set;
template<
typename Key,typename Value,
typename Hash=boost::hash<Key>,typename Pred=std::equal_to<Key>,
typename Allocator=std::allocator</* equivalent to std::pair<const Key,Value> */>
>
class fca_simple_unordered_map;
Abandoned experiment where individual occupied buckets where linked in a bidirectional
list. Outperformed by fca_unordered_[set|map]
with grouped_buckets
.